preamble
Captcha? Can I crack it too?
The introduction of the CAPTCHA is not much to say, a variety of CAPTCHA in people's lives will pop up from time to time, as a student daily contact with the most is the Registrar's Office system of the CAPTCHA, such as the following CAPTCHA:
Identification methods
Simulated login has complex steps, here we do not care about other operations, just responsible for the input of a CAPTCHA image to return a string of answers.
We know that CAPTCHA will make the image colorful in order to create interference, and the first thing we need to do is to remove these interferences, this step will require constant experimentation, enhancement of the image color, increase the contrast and so on can produce help.
After various manipulations of the image, we finally found the perfect solution for removing the interference. You can see that after removing the interference, the optimal case, we will get a very pure black and white character picture. There are four characters in a picture, there is no way to recognize all four characters at once, you need to crop the picture, crop it to only one character per small picture, and then recognize each picture separately.
The next step is to recognize the text, we first convert the obtained small picture into a matrix of 01 representations, each matrix represents a character.
For example, the matrix of the number six
num_6=[ 0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,1,1,0,0,0,0,0,0, 0,0,0,0,1,1,1,0,0,0,0,0,0, 0,0,0,1,1,1,0,0,0,0,0,0,0, 0,0,0,1,1,0,0,0,0,0,0,0,0, 0,0,1,1,0,0,0,0,0,0,0,0,0, 0,0,1,1,0,0,0,0,0,0,0,0,0, 0,1,1,1,1,1,1,1,0,0,0,0,0, 0,1,1,1,1,1,1,1,1,0,0,0,0, 0,1,1,0,0,0,0,1,1,1,0,0,0, 0,1,1,0,0,0,0,0,1,1,0,0,0, 0,1,1,0,0,0,0,0,1,1,0,0,0, 0,1,1,1,0,0,0,1,1,1,0,0,0, 0,0,1,1,1,1,1,1,1,0,0,0,0, 0,0,0,1,1,1,1,1,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0, ]
From a distance, you can still tell them apart by squinting.
Because the CAPTCHA is very regular, each number is located in a fixed position, so there is no need to involve what machine learning algorithms, just a simple matrix comparison can be, in all the implementation of a good matrix to find the highest degree of similarity of the matrix can be, where a variety of methods of comparison, in any case, the data can be correctly recognized as simple as good.
At this point, we are done with CAPTCHA recognition.
The CAPTCHA recognition carried out this time mainly uses python's PIL for image manipulation, and the full code for simulating login to automatically fill in the CAPTCHA can be found here:
sample code (computing)
# -*- coding: utf-8 -* import sys reload(sys) ( "utf-8" ) import re import requests import io import os import json from PIL import Image from PIL import ImageEnhance from bs4 import BeautifulSoup import mdata class Student: def __init__(self, user,password): = str(user) = str(password) = () def login(self): url = "http://202.118.31.197/?mode=4" res = (url).text imageUrl = 'http://202.118.31.197/'+('<img src="(.+?)" width="55"',res)[0] im = (((imageUrl).content)) enhancer = (im) im = (7) x,y = for i in range(y): for j in range(x): if (((j,i))!=(0,0,0)): ((j,i),(255,255,255)) num = [6,19,32,45] verifyCode = "" for i in range(4): a = ((num[i],0,num[i]+13,20)) l=[] x,y = for i in range(y): for j in range(x): if (((j,i))==(0,0,0)): (1) else: (0) his=0 chrr=""; for i in : r=0; for j in range(260): if(l[j]==[i][j]): r+=1 if(r>his): his=r chrr=i verifyCode+=chrr # print "Assisted input verification code complete:",verifyCode data= { 'WebUserNO':str(), 'Password':str(), 'Agnomen':verifyCode, } url = "http://202.118.31.197/?mode=4" t = (url,data=data).text if ("images/Logout2",t)==[]: l = '[0,"'+('alert((.+?));',t)[1][1][2:-2]+'"]'+" "++" "++"\n" # print l # return '[0,"'+('alert((.+?));',t)[1][1][2:-2]+'"]' return [False,l] else: l = 'Login Successful '+('! (.+?) ',t)[0]+" "++" "++"\n" # print l return [True,l] def getInfo(self): imageUrl = 'http://202.118.31.197/' data = ('http://202.118.31.197/?mode=3').text #Student registration information data = BeautifulSoup(data,"lxml") q = data.find_all("table",attrs={'align':"left"}) a = [] for i in q[0]: if type(i)==type(q[0]) : for j in i : if type(j) ==type(i): () for i in q[1]: if type(i)==type(q[1]) : for j in i : if type(j) ==type(i): () data = {} for i in range(1,len(a),2): data[a[i-1]]=a[i] # data['photo'] = ((imageUrl).content) return (data) def getPic(self): imageUrl = 'http://202.118.31.197/' pic = (((imageUrl).content)) return pic def getScore(self): score = ('http://202.118.31.197/').text # Transcripts score = BeautifulSoup(score, "lxml") q = score.find_all(attrs={'height':"36"})[0] point = print point[('GPA'):] table = people = table.find_all(attrs={'height' : '36'})[0].string r = table.find_all('table',attrs={'align' : 'left'})[0].find_all('tr') subject = [] lesson = [] for i in r[0]: if type(r[0])==type(i): () for i in r: k=0 temp = {} for j in i: if type(r[0])==type(j): temp[subject[k]] = k+=1 (temp) () (0) return (lesson) def logoff(self): return ('http://202.118.31.197/').text if __name__ == "__main__": a = Student(20150000,20150000) r = () print r[1] if r[0]: r = (()) for i in r: for j in i: print i[j], print q = (()) for i in q: print i,q[i] ().show() ()
summarize
Above is the entire content of this article, I hope that the content of this article on everyone's learning or use of python can bring some help, if there is any doubt you can leave a message to exchange, thank you for my support.