SoFunction
Updated on 2024-10-28

Example explanation of crawling indices in python crawler

There are some data we can't visualize, we need to get through the crawl to get. Hearing the word index, some partners feel very complicated, it seems that only in the stock time only heard of, such as some of the data of the rise and fall of the analysis is a tricky problem. However, the index for our data analysis is still very helpful, today I crawled on the python crawler to capture the index method to bring you an explanation.

I just needed to use this crawler in the past few days, and it turned out that the baidu index request changed a bit, so I changed it:

import requests
import sys
import time
word_url = '/api/SearchApi/thumbnail?area=0&word={}'
COOKIES = ''
headers = {
 'Accept': 'application/json, text/plain, */*',
 'Accept-Encoding': 'gzip, deflate',
 'Accept-Language': 'zh-CN,zh;q=0.9',
 'Cache-Control': 'no-cache',
 'Cookie': COOKIES,
 'DNT': '1',
 'Host': '',
 'Pragma': 'no-cache',
 'Proxy-Connection': 'keep-alive',
 'Referer': '/v2/main/',
 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.90 Safari/537.36',
 'X-Requested-With': 'XMLHttpRequest',
}
def decrypt(t,e):
 n = list(t)
 i = list(e)
 a = {}
 result = []
 ln = int(len(n)/2)
 start = n[ln:]
 end = n[:ln]
 for j,k in zip(start, end):
  ({k: j})
 for j in e:
  ((j))
 return ''.join(result)
  
def get_ptbk(uniqid):
 url = '/Interface/ptbk?uniqid={}'
 resp = ((uniqid), headers=headers)
 if resp.status_code != 200:
  print('Failed to get uniqid')
  (1)
 return ().get('data')
def get_index_data(keyword, start='2011-01-03', end='2019-08-05'):
 keyword = str(keyword).replace("'", '"')
 url = f'/api/SearchApi/index?area=0&word={keyword}&area=0&startDate={start}&endDate={end}'
 resp = (url, headers=headers)
  print('Failed to get index')
 content = ()
 data = ('data')
 user_indexes = ('userIndexes')[0]
 uniqid = ('uniqid')
 ptbk = get_ptbk(uniqid)
 while ptbk is None or ptbk == '':
  ptbk = get_ptbk(uniqid)
 all_data = user_indexes.get('all').get('data')
 result = decrypt(ptbk, all_data)
 result = (',')
 print(result)
if __name__ == '__main__':
 words = [[{"name": "Kool-Aid.", "wordType": 1}]]
get_index_data(words)

Output.

 

Run the code can get the index we want, of course, can also be used to see the stock as well as some other operations, the use of python crawler to solve are good choices, interested partners can also follow the editor to try.

to this article on the python crawler to capture the index of the example explains the article is introduced to this, more related python crawler how to capture the index of the content please search for my previous posts or continue to browse the following related articles I hope you will support me in the future more!