SoFunction
Updated on 2025-03-04

Example of method to crawl Kugou playlist by PHPCrawl crawler library

This article example describes the method of catching Kugou playlist by PHPCrawl crawler library. Share it for your reference, as follows:

After watching videos related to web crawlers, I felt itchy and wanted to crawl something. The emoticons battle on Facebook has been very fierce recently, so I wanted to climb down all the emoticons, but I couldn't find a suitable VPN for a while, so I captured Kugou's selected songs and brief introductions in the past January locally. The code is a bit messy, I am not very satisfied with it, and I don’t want to put it on to embarrass you. But then I thought about it, it was my first time crawling, so... I had the following unsatisfactory code~~~ (Since the amount of data crawled is small, I didn't consider multiple processes or anything else. However, I looked at the PHPCrawl documentation and found that the PHPCrawl library has encapsulated all the functions I can think of, which is very convenient to implement)

<?php
header("Content-type:text/html;charset=utf-8");
// It may take a whils to crawl a site ...
set_time_limit(10000);
include("libs/");
class MyCrawler extends PHPCrawler {
  function handleDocumentInfo($DocInfo) {
    // Just detect linebreak for output ("\n" in CLI-mode, otherwise "<br>").
    if (PHP_SAPI == "cli") $lb = "\n";
    else $lb = "<br />";
    $url = $DocInfo->url;
    $pat = "/http:\/\/www\.kugou\.com\/yy\/special\/single\/\d+\.html/";
    if(preg_match($pat,$url) > 0){
    $this->parseSonglist($DocInfo);
    }
    flush();
  }
  public function parseSonglist($DocInfo){
    $content = $DocInfo->content;
    $songlistArr = array();
    $songlistArr['raw_url'] = $DocInfo->url;
    //Analysis of song introduction    $matches = array();
    $pat = "/<span>name:<\/span>([^(<br)]+)<br/";
    $ret = preg_match($pat,$content,$matches);
    if($ret>0){
      $songlistArr['title'] = $matches[1];
    }else{
      $songlistArr['title'] = '';
    }
    //Analysis of the song    $pat = "/<a title=\"([^\"]+)\" hidefocus=\"/";
    $matches = array();
    preg_match_all($pat,$content,$matches);
    $songlistArr['songs'] = array();
    for($i = 0;$i < count($matches[0]);$i++){
      $song_title = $matches[1][$i];
      array_push($songlistArr['songs'],array('title'=>$song_title));
    }
    echo "<pre>";
    print_r($songlistArr);
    echo "</pre>";
    }
  }
$crawler = new MyCrawler();
// URL to crawl
$start_url="/yy/special/index/";
$crawler->setURL($start_url);
// Only receive content of files with content-type "text/html"
$crawler->addContentTypeReceiveRule("#text/html#");
//Link extension$crawler->addURLFollowRule("#http://www\.kugou\.com/yy/special/single/\d+\.html$# i");
$crawler->addURLFollowRule("#\.com/yy/special/index/\d+-\d+-2\.html$# i");
// Store and send cookie-data like a browser does
$crawler->enableCookieHandling(true);
// Set the traffic-limit to 1 MB(1000 * 1024) (in bytes,
// for testing we dont want to "suck" the whole site)
//The crawl size is unlimited$crawler->setTrafficLimit(0);
// Thats enough, now here we go
$crawler->go();
// At the end, after the process is finished, we print a short
// report (see method getProcessReport() for more information)
$report = $crawler->getProcessReport();
if (PHP_SAPI == "cli") $lb = "\n";
else $lb = "<br />";
echo "Summary:".$lb;
echo "Links followed: ".$report->links_followed.$lb;
echo "Documents received: ".$report->files_received.$lb;
echo "Bytes received: ".$report->bytes_received." bytes".$lb;
echo "Process runtime: ".$report->process_runtime." sec".$lb; 
?>

PS: Here are two very convenient regular expression tools for your reference:

JavaScript regular expression online testing tool:
http://tools./regex/javascript

Regular expression online generation tool:
http://tools./regex/create_reg

For more information about PHP related content, please check out the topic of this site:Summary of usage of php regular expressions》、《Complete collection of PHP array (Array) operation techniques》、《Introduction to PHP basic syntax》、《Summary of PHP operations and operator usage》、《PHP object-oriented programming tutorial》、《Summary of PHP network programming skills》、《Summary of usage of php strings》、《PHP+mysql database operation tutorial"and"Summary of common database operation techniques for php

I hope this article will be helpful to everyone's PHP programming.