SoFunction
Updated on 2024-12-19

python in requests to crawl to the web page content appears garbled problem solution introduction

Recently, I've been learning python crawler, and I've encountered quite a few problems when using requests, such as how to use cookies for login verification in requests, which can be viewed in thethis article. This blog is going to address the issue of how to avoid garbling while using requests.

import requests  
res=("")  
print 

The above is a simple way to request data from a web page using requests. But it is very easy to have the problem of garbled code.

We can view the encoding mode in the source code by right-clicking on the web page: content="text/html;charset=utf-8"->.

We can then know that the encoding of the web page is utf8. Since the Chinese encoding is gbk, we need to change the encoding to gbk.

I checked some information, said requests can automatically get the encoding of the web page, and through the output look is utf8, yes Yes Yes. But the output of the content of the Chinese there is garbled. Some say you can directly specify the encode attribute of the content can be obtained, "= 'gbk'", but I tried it can not be.

python internal encoding for utf8, that is to say, python in the processing of other string content should first be converted to utf8 encoding, and then in the decoding for the encoding you want to output.

For example, s="Chinese" is a string of type str encoded in gb2312.

Requires ("gb2312") to decode gb2312 to Unicode.

Then the output should specify the encoding of s as gbk->("gbk")

Back to business, we get the page content res, through ("utf8", "ignore").encode ("gbk", "ignore") will not have a messy code.

The ignore attribute used here means to ignore one of the encodings and display only the valid ones.

summarize

Above is this article on python requests to crawl to the web page content of the problem of garbled solution to all the content, I hope to help you. Interested friends can continue to refer to this site:python programming requests in the network request to add cookies parameter method detailsPython_LDA implementation in detailetc., if there are deficiencies, welcome to leave a message to point out. Thank you friends for the support of this site!