SoFunction
Updated on 2024-10-29

Python WeChat public number article crawling sample code

I. Thoughts

We get the interface we need through a hyperlink in the graphic message of the web version of the WeChat public platform

图文消息

超链接

From the interface we can get the corresponding WeChat public number and all the corresponding WeChat public number articles.

II. Interface analysis

Get the interface to WeChat:
/cgi-bin/searchbiz?
Parameters:
action=search_biz
begin=0
count=5
query=public name
token = token value for each account
lang=zh_CN
f=json
ajax=1
Request method:
GET
So in this interface we just need to get the token, and the query is the public number you need to search, and the token can be obtained from the web link after login.

微信公众号

The interface to get the article of the corresponding public number:
/cgi-bin/appmsg?
Parameters:
action=list_ex
begin=0
count=5
fakeid=MjM5NDAwMTA2MA==
type=9
query=
token=557131216
lang=zh_CN
f=json
ajax=1
Request method:
GET
The values we need to get in this interface are the token from the previous step and the fakeid, which is available in the first interface. Thus we can get the data of the WeChat public number article.

微信公众号

III. Realization

 Step one:

First we need to simulate a login via selenium and then get the cookie and corresponding token

def weChat_login(user, password):
  post = {}
  browser = ()
  ('/')
  sleep(3)
  browser.delete_all_cookies()
  sleep(2)
  # Click to switch to account password entry
  browser.find_element_by_xpath("//a[@class='login__type__container__select-type']").click()
  sleep(2)
  # Simulate user clicks
  input_user = browser.find_element_by_xpath("//input[@name='account']")
  input_user.send_keys(user)
  input_password = browser.find_element_by_xpath("//input[@name='password']")
  input_password.send_keys(password)
  sleep(2)
  # Click to login
  browser.find_element_by_xpath("//a[@class='btn_login']").click()
  sleep(2)
  # WeChat Login Verification
  print('Please scan the QR code')
  sleep(20)
  # Refresh the current page
  ('/')
  sleep(5)
  # Get the current web link
  url = browser.current_url
  # Get current cookie
  cookies = browser.get_cookies()
  for item in cookies:
    post[item['name']] = item['value']
  # Convert to String
  cookie_str = (post)
  # Store locally
  with open('', 'w+', encoding='utf-8') as f:
    (cookie_str)
  print('Cookie saved locally successfully')
  # Slice the current web link to get the token
  paramList = ().split('?')[1].split('&')
  # Define a dictionary to store data
  paramdict = {}
  for item in paramList:
    paramdict[('=')[0]] = ('=')[1]
  # Return the token
  return paramdict['token']

A login method is defined with the login account and password as parameters, and then a dictionary is defined to store the cookie value. By simulating the user to enter the corresponding account password and click on the login, then a code scanning verification will appear, with the login of WeChat to scan the code can be.
After refreshing the current page, get the current cookie as well as the token and return.

Step two:

1. request to get the corresponding public interface, get the fakeid we need

 url = ''
  headers = {
    'HOST': '',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.183 Safari/537.36 Edg/86.0.622.63'
  }
  with open('', 'r', encoding='utf-8') as f:
    cookie = ()
  cookies = (cookie)
  resp = (url=url, headers=headers, cookies=cookies)
  search_url = '/cgi-bin/searchbiz?'
  params = {
    'action': 'search_biz',
    'begin': '0',
    'count': '5',
    'query': 'Name of the public number searched for',
    'token': token,
    'lang': 'zh_CN',
    'f': 'json',
    'ajax': '1'
  }
  search_resp = (url=search_url, cookies=cookies, headers=headers, params=params)

Pass in the token and cookie we've obtained, and then get the returned json data from WeChat by requesting it

lists = search_resp.json().get('list')[0]

With the above code, you can get the corresponding public number data

fakeid = ('fakeid')

With the above code you can get the corresponding fakeid

2. request to get WeChat public number article interface, fetch the article data we need

 appmsg_url = '/cgi-bin/appmsg?'
  params_data = {
    'action': 'list_ex',
    'begin': '0',
    'count': '5',
    'fakeid': fakeid,
    'type': '9',
    'query': '',
    'token': token,
    'lang': 'zh_CN',
    'f': 'json',
    'ajax': '1'
  }
  appmsg_resp = (url=appmsg_url, cookies=cookies, headers=headers, params=params_data)

We pass in the fakeid and token and then still call the request interface to get the returned json data.
We then realized the crawling of WeChat articles.

IV. Summary

By crawling WeChat public number articles, you need to master the usage of selenium and requests, and how to get the request interface. However, it should be noted that when we cycle through the articles, we must set the delay time, otherwise the account will easily be blocked, so that we can not get the returned data.

This article on Python WeChat public number article crawling sample code is introduced to this article, more related Python WeChat public number article crawling content, please search my previous articles or continue to browse the following related articles I hope that you will support me in the future more!