SoFunction
Updated on 2025-03-03

Python implements example code to automatically identify and fill in verification code

Preface

To automatically identify and fill in verification codes in web pages, it is necessary to use a combination of web crawling technology, image recognition (OCR), and possible browser automation tools (such as Selenium). Here is a simple implementation of how to combine these technologies to achieve this goal:

Step 1: Obtain the verification code picture

First, you need to download the verification code picture from the web page through web crawling technology. This usually involves analyzing the HTML structure of a web page, finding the URL of the verification code image, and then using the requests library to download the image.

1import requests
2
3def download_captcha(url):
4    response = (url)
5    with open('', 'wb') as f:
6        ()

Step 2: Image Preprocessing and Recognition

Next, usepytesseractandopencv-pythonPreprocess and identify the downloaded verification code pictures.

First, make sure that these two libraries are installed:

pip install pytesseract opencv-python

You can then use the following Python code to identify the verification code:

import cv2
import pytesseract
 
def recognize_captcha(image_path):
    # Loading the image    image = (image_path)
 
    # Convert to grayscale image    gray_image = (image, cv2.COLOR_BGR2GRAY)
 
    # Use Gaussian blur to reduce noise    blurred_gray_image = (gray_image, (5, 5), 0)
 
    # Use binarization to improve contrast    _, binary_image = (blurred_gray_image, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)
 
    # OCR using PyTesseract    recognized_text = pytesseract.image_to_string(binary_image, lang='eng')
 
    return recognized_text
 
# Test functionif __name__ == "__main__":
    captcha_image_path = "path/to/your/captcha/"  # Replace with your own verification code image path    recognized_captcha = recognize_captcha(captcha_image_path)
    print("Recognized captcha:", recognized_captcha)

Step 3: Use Selenium to simulate browser operations

Selenium is a powerful tool that simulates the behavior of real users, including filling out forms and clicking buttons. First install selenium:

pip install selenium

Make sure that the appropriate WebDriver (such as ChromeDriver) is installed on your system, then use Selenium to open the web page, locate the input box and submit buttons, and fill in the recognized verification code.

from selenium import webdriver
from  import Keys
 
def fill_captcha_and_submit(captcha_value, form_url):
    driver = ()  # Make sure that the ChromeDriver path has been added to the environment variable or specified the full path    (form_url)
    
    # Assume that the id of the input tag is 'captcha_input', and the id of the submit button is 'submit_button'    captcha_input = driver.find_element_by_id('captcha_input')
    submit_button = driver.find_element_by_id('submit_button')
    
    captcha_input.send_keys(captcha_value)
    submit_button.click()
 
    # Remember to close the browser window    ()

Integration process

Finally, integrate the above steps to achieve a complete automation process:

def main():
    captcha_url = "The URL of the verification code image in the web page"
    form_url = "The URL to submit the form"
    
    download_captcha(captcha_url)
    captcha_text = recognize_captcha('')
    fill_captcha_and_submit(captcha_text, form_url)
 
if __name__ == "__main__":
    main()

Please note that automatic identification and filling in verification codes may violate the website's terms of service, and the accuracy of automatic identification may be greatly reduced for complex designs, noise or deformation verification codes. In addition, frequent automation requests may also lead to IP ban. Therefore, in actual application, please ensure that you comply with relevant laws, regulations and terms of service.

The above is the detailed content of the example code that automatically recognizes and fills up verification codes in Python. For more information about Python's automatic recognition and fills up verification codes, please pay attention to my other related articles!