SoFunction
Updated on 2025-03-02

Analysis of the method of obtaining Chinese in pictures and texts in Python3

This article describes the method of obtaining Chinese in picture text in Python 3. Share it for your reference, as follows:

1. Operating environment

(1) win10

(2) pycharm

(3) python 3.5

(4) Install pillow and pytesseract library:

pip3 install pillow
pip3 install pytesseract

(5) The identification engine tesseract-ocr, download and unzip the installation, download address:https:///softs/

2. Run the code

# -*- coding: utf-8 -*-
from PIL import Image
import pytesseract
#The above are all guide packages, and you only need the following line to realize image text recognitiontext=pytesseract.image_to_string((''),lang='chi_sim') #Set as Chinese text recognition#text=pytesseract.image_to_string((''),lang='eng') #Set to recognition of English or Arabic lettersprint(text)

3. Error response

1.FileNotFoundError:[WinError 2] The system cannot find the specified file.

Solution:

Open the file, find the following code, modify the value of tesseract_cmd to full path, and use it again will not report this error.

tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract'

2.:(1,'Error opening data file\\Progr

Solution:

Open the file and findimage_to_string, specify the parameters of config, as follows:

tessdata_dir_config = '--tessdata-dir "C:\\Program Files (x86)\\Tesseract-OCR\\tessdata"'
def image_to_string(image, lang=None, boxes=False, config=tessdata_dir_config):

For more information about Python-related content, please check out the topic of this site:Python data structure and algorithm tutorial》、《Summary of Python encoding operation skills》、《Summary of Python function usage tips》、《Summary of Python string operation skills"and"Python introduction and advanced classic tutorials

I hope this article will be helpful to everyone's Python programming.