SoFunction
Updated on 2024-10-30

Recognizing CAPTCHA in python tutorials for beginners

preamble

Captcha? Can I crack it too?

The introduction of the CAPTCHA is not much to say, a variety of CAPTCHA in people's lives will pop up from time to time, as a student daily contact with the most is the Registrar's Office system of the CAPTCHA, such as the following CAPTCHA:

Identification methods

Simulated login has complex steps, here we do not care about other operations, just responsible for the input of a CAPTCHA image to return a string of answers.

We know that CAPTCHA will make the image colorful in order to create interference, and the first thing we need to do is to remove these interferences, this step will require constant experimentation, enhancement of the image color, increase the contrast and so on can produce help.

After various manipulations of the image, we finally found the perfect solution for removing the interference. You can see that after removing the interference, the optimal case, we will get a very pure black and white character picture. There are four characters in a picture, there is no way to recognize all four characters at once, you need to crop the picture, crop it to only one character per small picture, and then recognize each picture separately.

The next step is to recognize the text, we first convert the obtained small picture into a matrix of 01 representations, each matrix represents a character.

For example, the matrix of the number six

num_6=[
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,1,1,0,0,0,0,0,0,
0,0,0,0,1,1,1,0,0,0,0,0,0,
0,0,0,1,1,1,0,0,0,0,0,0,0,
0,0,0,1,1,0,0,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,0,0,
0,0,1,1,0,0,0,0,0,0,0,0,0,
0,1,1,1,1,1,1,1,0,0,0,0,0,
0,1,1,1,1,1,1,1,1,0,0,0,0,
0,1,1,0,0,0,0,1,1,1,0,0,0,
0,1,1,0,0,0,0,0,1,1,0,0,0,
0,1,1,0,0,0,0,0,1,1,0,0,0,
0,1,1,1,0,0,0,1,1,1,0,0,0,
0,0,1,1,1,1,1,1,1,0,0,0,0,
0,0,0,1,1,1,1,1,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
0,0,0,0,0,0,0,0,0,0,0,0,0,
]

From a distance, you can still tell them apart by squinting.

Because the CAPTCHA is very regular, each number is located in a fixed position, so there is no need to involve what machine learning algorithms, just a simple matrix comparison can be, in all the implementation of a good matrix to find the highest degree of similarity of the matrix can be, where a variety of methods of comparison, in any case, the data can be correctly recognized as simple as good.

At this point, we are done with CAPTCHA recognition.

The CAPTCHA recognition carried out this time mainly uses python's PIL for image manipulation, and the full code for simulating login to automatically fill in the CAPTCHA can be found here:

sample code (computing)

# -*- coding: utf-8 -*
import sys
reload(sys)
( "utf-8" )
import re
import requests
import io
import os
import json
from PIL import Image
from PIL import ImageEnhance
from bs4 import BeautifulSoup

import mdata

class Student:
 def __init__(self, user,password):
   = str(user)
   = str(password)
   = ()

 def login(self):
  url = "http://202.118.31.197/?mode=4"
  res = (url).text
  imageUrl = 'http://202.118.31.197/'+('<img src="(.+?)" width="55"',res)[0]
  im = (((imageUrl).content))
  enhancer = (im)
  im = (7)
  x,y = 
  for i in range(y):
   for j in range(x):
    if (((j,i))!=(0,0,0)):
     ((j,i),(255,255,255))
  num = [6,19,32,45]
  verifyCode = ""
  for i in range(4):
   a = ((num[i],0,num[i]+13,20))
   l=[]
   x,y = 
   for i in range(y):
    for j in range(x):
     if (((j,i))==(0,0,0)):
      (1)
     else:
      (0)
   his=0
   chrr="";
   for i in :
    r=0;
    for j in range(260):
     if(l[j]==[i][j]):
      r+=1
    if(r>his):
     his=r
     chrr=i
   verifyCode+=chrr
   # print "Assisted input verification code complete:",verifyCode
  data= {
  'WebUserNO':str(),
  'Password':str(),
  'Agnomen':verifyCode,
  }
  url = "http://202.118.31.197/?mode=4"
  t = (url,data=data).text
  if ("images/Logout2",t)==[]:
   l = '[0,"'+('alert((.+?));',t)[1][1][2:-2]+'"]'+" "++" "++"\n"
   # print l
   # return '[0,"'+('alert((.+?));',t)[1][1][2:-2]+'"]'
   return [False,l]
  else:
   l = 'Login Successful '+('!&nbsp;(.+?)&nbsp;',t)[0]+" "++" "++"\n"
   # print l
   return [True,l]

 def getInfo(self):
  imageUrl = 'http://202.118.31.197/'
  data = ('http://202.118.31.197/?mode=3').text #Student registration information
  data = BeautifulSoup(data,"lxml")
  q = data.find_all("table",attrs={'align':"left"})
  a = []
  for i in q[0]:
   if type(i)==type(q[0]) :
    for j in i :
     if type(j) ==type(i):
      ()
  for i in q[1]:
   if type(i)==type(q[1]) :
    for j in i :
     if type(j) ==type(i):
      ()
  data = {}
  for i in range(1,len(a),2):
   data[a[i-1]]=a[i]
  # data['photo'] = ((imageUrl).content)
  return (data)

 def getPic(self):
  imageUrl = 'http://202.118.31.197/'
  pic = (((imageUrl).content))
  return pic

 def getScore(self):
   score = ('http://202.118.31.197/').text # Transcripts
   score = BeautifulSoup(score, "lxml")
   q = score.find_all(attrs={'height':"36"})[0]
   point = 
   print point[('GPA'):]
   table = 
   people = table.find_all(attrs={'height' : '36'})[0].string
   r = table.find_all('table',attrs={'align' : 'left'})[0].find_all('tr')
   subject = []
   lesson = []
   for i in r[0]:
    if type(r[0])==type(i):
     ()
   for i in r:
    k=0
    temp = {}
    for j in i:
     if type(r[0])==type(j):
      temp[subject[k]] = 
      k+=1
    (temp)
   ()
   (0)
   return (lesson)

 def logoff(self):
  return ('http://202.118.31.197/').text

if __name__ == "__main__":
 a = Student(20150000,20150000)
 r = ()
 print r[1]
 if r[0]:
  r = (())
  for i in r:
   for j in i:
    print i[j],
   print
  q = (())
  for i in q:
   print i,q[i]
  ().show()
 ()

summarize

Above is the entire content of this article, I hope that the content of this article on everyone's learning or use of python can bring some help, if there is any doubt you can leave a message to exchange, thank you for my support.