Code examples for identifying and processing verification codes using Python

1. Types of verification codes

Before introducing the recognition method, let’s first understand the common types of verification codes:

Calculation verification code: The user requires simple mathematical calculations, such as "3+5=?".
Slider verification code: The user needs to drag the slider to the correct position to complete the verification.
Image recognition verification code: Verification is completed by identifying characters or patterns in the picture.
Voice verification code: The verification code content is reported through voice, and the user enters the content he hears.

This article focuses on the identification method of graph recognition verification codes, because this type of verification code is most common in automated testing.

2. Introduction to OCR technology

OCR (Optical Character Recognition) technology refers to the process of scanning characters and then translating them into electronic text through their shape. OCR technology plays an important role in verification code recognition.

There are multiple OCR libraries in Python that can be used, such as tesseract, pytesseract, pyocr, etc. Among them, tesseract is a powerful OCR engine open source of Google, and pytesseract and pyocr both have a Python API encapsulation for tesseract, which is convenient for us to call in Python.

3. Use OCR technology to identify verification codes

1. Install the required libraries

First, we need to install the tesseract engine and the pytesseract library. At the same time, some image processing libraries are also needed, such as PIL (Pillow) or OpenCV.

# Install tesseract (taking Windows as an example)# Download the tesseract installation package and install it to the specified directory, such as C:\Program Files\Tesseract-OCR 
# Install pytesseract and Pillowpip install pytesseract Pillow

After the installation is complete, pytesseract needs to be configured so that it can find the executable file of tesseract. In Python code, it can be implemented by setting .tesseract_cmd.

import pytesseract
 
# Set the path to the tesseract executable file.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\'

2. Download and process verification code pictures

Next, we need to download the verification code picture and perform some preprocessing to improve the recognition accuracy of OCR.

import requests
from PIL import Image
 
# Download the verification code pictureurl = '/captcha'  # Replace with the actual verification code URLresponse = (url)
with open('', 'wb') as f:
    ()
 
# Open the image and preprocess itimage = ('')
# Convert to grayscale imagegray_image = ('L')
# Binary processingthreshold = 127
table = []
for i in range(256):
    if i &lt; threshold:
        (0)
    else:
        (1)
binary_image = (table, '1')

3. Use OCR for identification

After preprocessing, we can use pytesseract to convert the image to text.

# Use pytesseract to identifytext = pytesseract.image_to_string(binary_image)
print('Identification results:', text)

4. Complete code example

The following is a complete sample code that shows the entire process from downloading verification code images to identifying text.

import requests
from PIL import Image
import pytesseract
 
# Set the path to the tesseract executable file.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\'
 
# Download the verification code pictureurl = '/captcha'  # Replace with the actual verification code URLresponse = (url)
with open('', 'wb') as f:
    ()
 
# Open the image and preprocess itimage = ('')
gray_image = ('L')
threshold = 127
table = []
for i in range(256):
    if i &lt; threshold:
        (0)
    else:
        (1)
binary_image = (table, '1')
 
# Use pytesseract to identifytext = pytesseract.image_to_string(binary_image)
print('Identification results:', text)

4. Process complex verification codes

For some complex verification codes, such as verification codes with elements such as rotation, puzzle, and sliding, OCR technology may not be able to directly identify them. At this time, we can use some professional coding platforms.

A coding platform is a third-party platform that provides verification code identification services. They usually have professional human or machine to identify various types of verification codes, and then return the results through the API interface. Of course, this kind of service requires payment, and the price varies according to the difficulty and quantity of verification codes.

There are multiple libraries for coding platforms in Python that can be used, such as Chaojiying, yundama, ruokuai, etc. They all provide corresponding API documentation and sample code for us to call in Python.

1. Register and recharge the coding platform account
First of all, we need to register an account on the coding platform and recharge it. After recharge, we can obtain the API key and API interface address.

2. Install and import the coding platform library
Taking Chaojiying as an example, we can use pip to install Chaojiying's Python library.

pip install chaojiying

After the installation is complete, import the library in the Python code.

from chaojiying import Chaojiying_Client

3. Call the coding platform API for identification

Next, we can use the API of the coding platform to perform verification code recognition. Here is a sample code.

from chaojiying import Chaojiying_Client
import requests
 
#Coding platform account informationusername = 'your_username'  # Replace with your accountpassword = 'your_password'  # Replace with your passwordsoft_id = 'your_soft_id'    # Replace with your software ID 
# Initialize the coding platform clientclient = Chaojiying_Client(username, password, soft_id)
 
# Download the verification code pictureurl = '/captcha'  # Replace with the actual verification code URLresponse = (url)
with open('', 'wb') as f:
    ()
 
# Call the coding platform API for identificationim_path = ''
result = (im_path, '')
 
# Analyze and identify resultsif result['err_no'] == 0:
    print('Identification results:', result['pic_str'])
else:
    print('Recognition failed:', result['err_msg'])

5. Case: Identify the verification code of ancient poetry and prose website

The following is a specific case showing how to use Python to identify the verification code of the ancient poetry website.

import requests
from PIL import Image
import pytesseract
 
# Set the path to the tesseract executable file.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\'
 
# Download the verification code pictureurl = '/'
response = (url)
with open('', 'wb') as f:
    ()
 
# Open the image and preprocess itimage = ('')
gray_image = ('L')
threshold = 127
table = []
for i in range(256):
    if i &lt; threshold:
        (0)
    else:
        (1)
binary_image = (table, '1')
 
# Use pytesseract to identifytext = pytesseract.image_to_string(binary_image)
print('Identification results:', ())

6. Summary

This article details how to use Python to identify and process verification codes. Through OCR technology and coding platform, we can realize the identification of simple and complex verification codes. In practical applications, we need to select appropriate recognition methods based on the type and difficulty of verification codes, and perform corresponding pre-processing and post-processing to improve the accuracy and stability of recognition.

The above is the detailed content of the code examples for using Python to identify and process verification codes. For more information about Python to identify and process verification codes, please pay attention to my other related articles!