I. Thoughts
We get the interface we need through a hyperlink in the graphic message of the web version of the WeChat public platform
From the interface we can get the corresponding WeChat public number and all the corresponding WeChat public number articles.
II. Interface analysis
Get the interface to WeChat:
/cgi-bin/searchbiz?
Parameters:
action=search_biz
begin=0
count=5
query=public name
token = token value for each account
lang=zh_CN
f=json
ajax=1
Request method:
GET
So in this interface we just need to get the token, and the query is the public number you need to search, and the token can be obtained from the web link after login.
The interface to get the article of the corresponding public number:
/cgi-bin/appmsg?
Parameters:
action=list_ex
begin=0
count=5
fakeid=MjM5NDAwMTA2MA==
type=9
query=
token=557131216
lang=zh_CN
f=json
ajax=1
Request method:
GET
The values we need to get in this interface are the token from the previous step and the fakeid, which is available in the first interface. Thus we can get the data of the WeChat public number article.
III. Realization
Step one:
First we need to simulate a login via selenium and then get the cookie and corresponding token
def weChat_login(user, password): post = {} browser = () ('/') sleep(3) browser.delete_all_cookies() sleep(2) # Click to switch to account password entry browser.find_element_by_xpath("//a[@class='login__type__container__select-type']").click() sleep(2) # Simulate user clicks input_user = browser.find_element_by_xpath("//input[@name='account']") input_user.send_keys(user) input_password = browser.find_element_by_xpath("//input[@name='password']") input_password.send_keys(password) sleep(2) # Click to login browser.find_element_by_xpath("//a[@class='btn_login']").click() sleep(2) # WeChat Login Verification print('Please scan the QR code') sleep(20) # Refresh the current page ('/') sleep(5) # Get the current web link url = browser.current_url # Get current cookie cookies = browser.get_cookies() for item in cookies: post[item['name']] = item['value'] # Convert to String cookie_str = (post) # Store locally with open('', 'w+', encoding='utf-8') as f: (cookie_str) print('Cookie saved locally successfully') # Slice the current web link to get the token paramList = ().split('?')[1].split('&') # Define a dictionary to store data paramdict = {} for item in paramList: paramdict[('=')[0]] = ('=')[1] # Return the token return paramdict['token']
A login method is defined with the login account and password as parameters, and then a dictionary is defined to store the cookie value. By simulating the user to enter the corresponding account password and click on the login, then a code scanning verification will appear, with the login of WeChat to scan the code can be.
After refreshing the current page, get the current cookie as well as the token and return.
Step two:
1. request to get the corresponding public interface, get the fakeid we need
url = '' headers = { 'HOST': '', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36 Edg/86.0.622.63' } with open('', 'r', encoding='utf-8') as f: cookie = () cookies = (cookie) resp = (url=url, headers=headers, cookies=cookies) search_url = '/cgi-bin/searchbiz?' params = { 'action': 'search_biz', 'begin': '0', 'count': '5', 'query': 'Name of the public number searched for', 'token': token, 'lang': 'zh_CN', 'f': 'json', 'ajax': '1' } search_resp = (url=search_url, cookies=cookies, headers=headers, params=params)
Pass in the token and cookie we've obtained, and then get the returned json data from WeChat by requesting it
lists = search_resp.json().get('list')[0]
With the above code, you can get the corresponding public number data
fakeid = ('fakeid')
With the above code you can get the corresponding fakeid
2. request to get WeChat public number article interface, fetch the article data we need
appmsg_url = '/cgi-bin/appmsg?' params_data = { 'action': 'list_ex', 'begin': '0', 'count': '5', 'fakeid': fakeid, 'type': '9', 'query': '', 'token': token, 'lang': 'zh_CN', 'f': 'json', 'ajax': '1' } appmsg_resp = (url=appmsg_url, cookies=cookies, headers=headers, params=params_data)
We pass in the fakeid and token and then still call the request interface to get the returned json data.
We then realized the crawling of WeChat articles.
IV. Summary
By crawling WeChat public number articles, you need to master the usage of selenium and requests, and how to get the request interface. However, it should be noted that when we cycle through the articles, we must set the delay time, otherwise the account will easily be blocked, so that we can not get the returned data.
This article on Python WeChat public number article crawling sample code is introduced to this article, more related Python WeChat public number article crawling content, please search my previous articles or continue to browse the following related articles I hope that you will support me in the future more!