This article describes the method of obtaining Chinese in picture text in Python 3. Share it for your reference, as follows:
1. Operating environment
(1) win10
(2) pycharm
(3) python 3.5
(4) Install pillow and pytesseract library:
pip3 install pillow pip3 install pytesseract
(5) The identification engine tesseract-ocr, download and unzip the installation, download address:https:///softs/
2. Run the code
# -*- coding: utf-8 -*- from PIL import Image import pytesseract #The above are all guide packages, and you only need the following line to realize image text recognitiontext=pytesseract.image_to_string((''),lang='chi_sim') #Set as Chinese text recognition#text=pytesseract.image_to_string((''),lang='eng') #Set to recognition of English or Arabic lettersprint(text)
3. Error response
1.FileNotFoundError:[WinError 2] The system cannot find the specified file.
Solution:
Open the file, find the following code, modify the value of tesseract_cmd to full path, and use it again will not report this error.
tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'
2.:(1,'Error opening data file\\Progr
Solution:
Open the file and findimage_to_string, specify the parameters of config, as follows:
tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
def image_to_string(image, lang=None, boxes=False, config=tessdata_dir_config):
For more information about Python-related content, please check out the topic of this site:Python data structure and algorithm tutorial》、《Summary of Python encoding operation skills》、《Summary of Python function usage tips》、《Summary of Python string operation skills"and"Python introduction and advanced classic tutorials》
I hope this article will be helpful to everyone's Python programming.