The address of the Oriental Fortune website is as follows:
/center/#hs_a_board
By clicking on the next page of the site, we find that the content of the page has changed, but the URL of the site remains the same, which means that Ajax technology is used here to dynamically pull data from the server. The advantage of this approach is that it can be used to update part of the data without reloading the entire web page, reducing the load on the network and accelerating the speed of loading the page.
We use F12 to check the network request situation, we can easily find that the data on the web page are requested through the following address
http://38./api/qt/clist/get?cb=jQuery112409036039385296142_1658838397275&pn=3&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838404848
Next we make a few more requests to see how that address changes and find that one of thepn
parameter represents the number of this page, and so we can modify the&pn=
followed by a number to access the corresponding data on different pages
import requests json_url = "http://48./api/qt/clist/get?cb=jQuery112402508937289440778_1658838703304&pn=1&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838703305" res = (json_url)
data processing Next we look at the returned data and see that it is not standard json data.
So we start by jsonizing:
result = ("jQuery112402508937289440778_1658838703304")[1].split("(")[1].split(");")[0] result_json = (result) result_json
Output:
Returns the meaning of each parameter:
- f2: latest price
- f3: up and down
- f4: Increase/decrease
- f5: Volume (lots)
- f6: Turnover
- f7: Amplitude
- f8: Turnover ratio
- f9: Price-earnings ratio
- f10: volume ratio
- f12: Stock code
- f14: Stock name
- f15: maximum
- f16: minimum
- f17: Imakai
- f18: collected yesterday
- f22: Price-to-book ratio
Prepare a storage function first
def save_data(data, date): if not (r'stock_data_%' % date): with open("stock_data_%" % date, "a+", encoding='utf-8') as f: ("Stock Code,Stock Name,Latest Price,Up/Down,Up/Down Amount,Volume (Lots),Turnover,Amplitude,Turnover Ratio,P/E Ratio,Volume Ratio,Highest,Lowest,Today's Open,Yesterday's Close,P/B Ratio\n") for i in data: Code = i["f12"] Name = i["f14"] Close = i['f2'] ChangePercent = i["f3"] Change = i['f4'] Volume = i['f5'] Amount = i['f6'] Amplitude = i['f7'] TurnoverRate = i['f8'] PERation = i['f9'] VolumeRate = i['f10'] Hign = i['f15'] Low = i['f16'] Open = i['f17'] PreviousClose = i['f18'] PB = i['f22'] row = '{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}'.format( Code,Name,Close,ChangePercent,Change,Volume,Amount,Amplitude, TurnoverRate,PERation,VolumeRate,Hign,Low,Open,PreviousClose,PB) (row) ('\n') else: ...
Then pass in the json data that was processed earlier:
stock_data = result_json['data']['diff'] save_data(stock_data, '2022-07-28')
This gives us the first page of stock data:
In the end we just need to loop through and crawl all the pages:
for i in range(1, 5): print("Crawl page %s" % str(i)) url = "http://48./api/qt/clist/get?cb=jQuery112402508937289440778_1658838703304&pn=%s&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838703305" % str(i) res = (json_url) result = ("jQuery112402508937289440778_1658838703304")[1].split("(")[1].split(");")[0] result_json = (result) stock_data = result_json['data']['diff'] save_data(stock_data, '2022-07-28')
This completes our entire stock data crawl.
To this article on the use of Python to crawl the stock real-time data details of the article is introduced to this, more related Python crawl stock data content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!