SoFunction
Updated on 2024-10-30

Crawling real-time stock data details with Python

The address of the Oriental Fortune website is as follows:

/center/#hs_a_board

alt

By clicking on the next page of the site, we find that the content of the page has changed, but the URL of the site remains the same, which means that Ajax technology is used here to dynamically pull data from the server. The advantage of this approach is that it can be used to update part of the data without reloading the entire web page, reducing the load on the network and accelerating the speed of loading the page.

We use F12 to check the network request situation, we can easily find that the data on the web page are requested through the following address

http://38./api/qt/clist/get?cb=jQuery112409036039385296142_1658838397275&pn=3&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838404848

alt

Next we make a few more requests to see how that address changes and find that one of thepnparameter represents the number of this page, and so we can modify the&pn=followed by a number to access the corresponding data on different pages

import requests
json_url = "http://48./api/qt/clist/get?cb=jQuery112402508937289440778_1658838703304&pn=1&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838703305"
res = (json_url)

data processing Next we look at the returned data and see that it is not standard json data.

alt

So we start by jsonizing:

result = ("jQuery112402508937289440778_1658838703304")[1].split("(")[1].split(");")[0]
result_json = (result)
result_json

Output:

alt

Returns the meaning of each parameter:

  • f2: latest price
  • f3: up and down
  • f4: Increase/decrease
  • f5: Volume (lots)
  • f6: Turnover
  • f7: Amplitude
  • f8: Turnover ratio
  • f9: Price-earnings ratio
  • f10: volume ratio
  • f12: Stock code
  • f14: Stock name
  • f15: maximum
  • f16: minimum
  • f17: Imakai
  • f18: collected yesterday
  • f22: Price-to-book ratio

Prepare a storage function first

def save_data(data, date):
    if not (r'stock_data_%' % date):
        with open("stock_data_%" % date, "a+", encoding='utf-8') as f:
            ("Stock Code,Stock Name,Latest Price,Up/Down,Up/Down Amount,Volume (Lots),Turnover,Amplitude,Turnover Ratio,P/E Ratio,Volume Ratio,Highest,Lowest,Today's Open,Yesterday's Close,P/B Ratio\n")
            for i in data:
                Code = i["f12"]
                Name = i["f14"]
                Close = i['f2']
                ChangePercent = i["f3"]
                Change = i['f4']
                Volume = i['f5']
                Amount = i['f6']
                Amplitude = i['f7']
                TurnoverRate = i['f8']
                PERation = i['f9']
                VolumeRate = i['f10']
                Hign = i['f15']
                Low = i['f16']
                Open = i['f17']
                PreviousClose = i['f18']
                PB = i['f22']
                row = '{},{},{},{},{},{},{},{},{},{},{},{},{},{},{},{}'.format(
                    Code,Name,Close,ChangePercent,Change,Volume,Amount,Amplitude,
                    TurnoverRate,PERation,VolumeRate,Hign,Low,Open,PreviousClose,PB)
                (row)
                ('\n')
    else:
    ...

Then pass in the json data that was processed earlier:

stock_data = result_json['data']['diff']
save_data(stock_data, '2022-07-28')

This gives us the first page of stock data:

alt

In the end we just need to loop through and crawl all the pages:

for i in range(1, 5):
    print("Crawl page %s" % str(i))
    url = "http://48./api/qt/clist/get?cb=jQuery112402508937289440778_1658838703304&pn=%s&pz=20&po=1&np=1&ut=bd1d9ddb04089700cf9c27f6f7426281&fltt=2&invt=2&wbp2u=|0|0|0|web&fid=f3&fs=m:0+t:6,m:0+t:80,m:1+t:2,m:1+t:23,m:0+t:81+s:2048&fields=f1,f2,f3,f4,f5,f6,f7,f8,f9,f10,f12,f13,f14,f15,f16,f17,f18,f20,f21,f23,f24,f25,f22,f11,f62,f128,f136,f115,f152&_=1658838703305" % str(i)
    res = (json_url)
    result = ("jQuery112402508937289440778_1658838703304")[1].split("(")[1].split(");")[0]
    result_json = (result)
    stock_data = result_json['data']['diff']
    save_data(stock_data, '2022-07-28')

This completes our entire stock data crawl.

To this article on the use of Python to crawl the stock real-time data details of the article is introduced to this, more related Python crawl stock data content please search for my previous articles or continue to browse the following related articles I hope you will support me more in the future!