SoFunction
Updated on 2024-10-29

Python uses chardet to determine character encoding

This article example describes how Python uses chardet to determine the character encoding. Shared for your reference. Specific analysis is as follows:

Python chardet used to implement string/file encoding detection templates

1、Download and install chardet

Download: /pypi/chardet

After downloading chardet, unzip the chardet archive, put the chardet folder directly under the application directory, then you can use import chardet to start using chardet, or you can copy chardet to the Python system directory, so that all of your python programs just use import chardet.

python  install

2. Examples

In use, () returns the dictionary, where confidence is the precision of detection and encoding is the form of encoding

(1) Web page encoding judgment:

>>> import urllib
>>> rawdata = ('/').read()
>>> import chardet
>>> (rawdata)
{'confidence': 0.98999999999999999, 'encoding': 'GB2312'}

(2) Document encoding judgment

import chardet
tt=open('c:\\','rb')
ff=()
#Trying to switch to read(5) here works fine, but switching to readlines() reports an error
enc=(ff)
print enc['encoding']
()

I hope that what I have described in this article will help you in your Python programming.