What is Snoopy? (Download snoopy)
Snoopy is a php class used to imitate the functions of a web browser. It can complete the tasks of obtaining web page content and sending forms.
Some features of Snoopy:
* Convenient to crawl the content of the web page
* Convenient to crawl the text content of the web page (removing HTML tags)
* Easy to crawl the links to the web page
* Support proxy hosts
* Support basic username/password verification
* Supports setting user_agent, referer (origin), cookies and header content (header file)
* Supports browser steering and can control steering depth
* Can expand links in web pages into high-quality urls (default)
* Easy to submit data and get the return value
* Support tracking HTML framework (v0.92 added)
* Supports transfer cookies when redirecting (v0.92 added)
To learn more, you can Google it yourself. Here are a few simple examples:
1 Get the specified URL content
PHP Code
$url = "https://";
include("");
$snoopy = new Snoopy;
$snoopy->fetch($url); //Get all content
echo $snoopy->results; //Show results
$snoopy->fetchtext //Get text content (remove the html code)
$snoopy->fetchlinks //Get link
$snoopy->fetchform //Get form
2 Form Submission
PHP Code
$formvars["username"] = "admin";
$formvars["pwd"] = "admin";
$action = "https://";//Form Submission Address
$snoopy->submit($action,$formvars);//$formvars is the submitted array
echo $snoopy->results; //Get the returned result after the form is submitted
$snoopy->submittext; //After submitting, only return to remove html's text
$snoopy->submitlinks;//Return only after submitting
Since the form has been submitted, we can do a lot of things. Next, let’s pretend to be the IP and the browser
3 Disguise
PHP Code
$formvars["username"] = "admin";
$formvars["pwd"] = "admin";
$action = "https://";
include "";
$snoopy = new Snoopy;
$snoopy->cookies["PHPSESSID"] = 'fc106b1918bd522cc863f36890e6fff7'; //Casked sessionid
$snoopy->agent = "(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)"; //Casked browser
$snoopy->referer = "http://s."; //Casked source page address http_referer
$snoopy->rawheaders["Pragma"] = "no-cache"; //cache's http header information
$snoopy->rawheaders["X_FORWARDED_FOR"] = "127.0.0.101"; //Casked ip
$snoopy->submit($action,$formvars);
echo $snoopy->results;
It turns out that we can disguise the session, disguise the browser, disguise the IP, and haha can do a lot of things.
For example, with verification code, verification IP voting, you can vote continuously.
ps: The ip here is actually disguising the http header, so the ip obtained through REMOTE_ADDR cannot be disguised.
On the contrary, those who obtain IP through the http header (those that can prevent proxy) can create IP by themselves.
Regarding how to verify the code, let me briefly talk about:
First, use an ordinary browser to view the page and find the sessionid corresponding to the verification code.
Note the sessionid and verification code value at the same time.
Next, use snoopy to forge it.
Principle: Since it is the same sessionid, the verification code obtained is the same as the first time you input it.
4 Sometimes we may need to forge more things, snoopy completely thought of it for us
PHP Code
$snoopy->proxy_host = "";
$snoopy->proxy_port = "8080"; //Use a proxy
$snoopy->maxredirs = 2; //Number of redirects
$snoopy->expandlinks = true; //Whether to complete the link is often used during collection
// For example, the link is /images/ can be changed to its full link https:///images/. This place can actually be replaced by the ereg_replace function when the final output is output.
$snoopy->maxframes = 5 //The maximum number of frames allowed
//Note that when crawling the frame, $snoopy->results returns an array
$snoopy->error //Return error message
I have understood the basic usage above, and I will demonstrate it once below:
PHP Code
<?
//echo var_dump($_SERVER);
include("");
$snoopy = new Snoopy;
$snoopy->agent = "Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-
CN; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5 FirePHP/0.2.1";//This is a browser message
Message, what browser do you use to view cookies before, use the information of that browser (ps:$_SERVER to view the information of the browser)
$snoopy->referer = "http://bbs./";
$snoopy->expandlinks = true;
$snoopy->rawheaders["COOKIE"]="__utmz=17229162.1227682761.29.=(referral)|utmcsr=|utmcct=/html/|utmcmd=referral; cdbphpchina_smile=1D2D0D1; cdbphpchina_cookietime=2592000; __utma=233700831.1562900865.1227113506.1229613449.1231233266.16; __utmz=233700831.1231233266.16.=(referral)|utmcsr=localhost:8080|utmcct=/|utmcmd=referral; __utma=17229162.1877703507.1227113568.1231228465.1231233160.58; uchome_loginuser=sinopf; xscdb_cookietime=2592000; __utmc=17229162; __utmb=17229162; cdbphpchina_sid=EX5w1V; __utmc=233700831; cdbphpchina_visitedfid=17; cdbphpchinaO766uPYGK6OWZaYlvHSuzJIP22VpwEMGnPQAuWCFL9Fd6CHp2e%2FKw0x4bKz0N9lGk; xscdb_auth=8106rAyhKpQL49eMs%2FyhLBf3C6ClZ%2B2idSk4bExJwbQr%2BHSZrVKgqPOttHVr%2B6KLPg3DtWpTMUI4ttqNNVpukUj6ElM; cdbphpchina_onlineusernum=3721";
$snoopy->fetch("http://bbs.");
$n=ereg_replace("href=\"","href=\"http://bbs./",$snoopy->results );
echo ereg_replace("src=\"","src=\"http://bbs./",$n);
?>
This is a simulation of logging into the PHPCHINA forum. First, you need to check your browser's letter.
Information: echo var_dump($_SERVER); This code can see the information of your browser,
Copy the content behind $_SERVER['HTTP_USER_AGENT'] and stick it in the $snoopy->agent, and then you want to view your own
COOKIE. After logging in to the forum with your account in the forum, enter it in the browser address bar.
javascript:(), press Enter, you can see your cookie information, copy and paste
Go to the back of $snoopy->rawheaders["COOKIE"]=. (My cookie information has been deleted for security reasons)
Then note:
# $n=ereg_replace("href=\"","href=\"http://bbs./",$snoopy->results );
# echo ereg_replace("src=\"","src=\"http://bbs./",$n);
In these two sentences, because all HTML source code addresses of collected content are relative links, they need to be replaced with absolute links, so they can reference the forum's pictures and css styles.