SoFunction
Updated on 2024-10-28

python implementation to recognize handwritten numbers python image recognition algorithm

write sth. upfront

This is arguably one of the hardest parts, because it's about recognizing images, so the algorithms involved will be more difficult compared to the previous ones, so I'll try to make it as clear as possible.

And because in the process of writing it, some of the previous logic has been modified a bit to make it better, so everything in this post will prevail. Of course, if you want to see the code directly, the code is all placed in my GitHub, so this article is mainly responsible for the explanation, if you need the code please go to GitHub by yourself.

This syllabus

Last time we wrote about the creation of the database, we were able to store the updated training images into a CSV file in real time. So this time, moving on, it's the turn to recognize the content of the images.

First we need to extract the image to be recognized from the folder and process it in the same way as the training image to get a vector of size 1x10000. Since there is a small difference between the two, I don't really want to add logic to the source code, so I'll just rewrite the function that adds the picture to be recognized as GetTestPicture, which is similar to GetTrainPicture, except that there is less of the part "add the name of picture except for the part of "add picture name".

After that we can start working on the official image recognition content.

The main thing is to calculate the distance between the picture to be recognized and all the training pictures. When two images are closer to each other, it means they are more similar, then they are likely to write the same number. So using this principle, we can find out the nearest few training images to the image to be recognized, and output how many numbers they are respectively. For example, I want to output the first three, the first three are 3, 3, 9, that means that the image to be recognized is likely to be 3.

After that, you can also add a weight to each position, the details of which will be left for another time, there is enough content in this section.

(In the first article I talked about utilizing image hole count detection. I tried it and thought it was a bit of a misnomer, for reasons placed at the end of this article.)

MAIN code

So put up the main code directly, the logic is still relatively clear

import os
import OperatePicture as OP
import OperateDatabase as OD
import PictureAlgorithm as PA
import csv

##Essential vavriable essential variables
#Standard size
N = 100
#Gray threshold
color = 200/255

n = 10

#Read the original CSV file
reader = list((open('', encoding = 'utf-8')))
# Clear the first blank line after reading
del reader[0]
# Read all filenames in the num directory
fileNames = (r"./num/")
# Compare fileNames with reader to get newFileNames of the added images
newFileNames = (fileNames, reader)
print('New pictures are: ', newFileNames)
# Get the matrix corresponding to newFilesNames
pic = (newFileNames)
#Store the matrix of added images in CSV
(pic, newFileNames)
# Merge the original database matrix with the new database matrix
pic = (reader, pic)

# Get the image to be recognized
testFiles = (r"./test/")
testPic = (testFiles)

# Calculate the possible classifications for each image to be recognized
result = (testPic, pic)
for item in result:
 for i in range(n):
  print('First'+str(i+1)+' A vector is '+str(item[i+n])+', the distance is '+str(item[i]))

Compared to the content of the previous post, only the following piece of code has been added in this post, i.e., get the name of the image to be recognized, get the vector of images to be recognized, and compute the classification.

Below we will focus on the contents of the CalculateResult function, the algorithm for recognizing images.

Algorithm content

Algorithm roughly explained

We've already covered this briefly in the outline, so I'll just copy it over and add a few more things.

Suppose we have two points A=(1,1) and B=(5,5) in the two dimensional plane, and I now put another point C=(2,2), then may I ask, which one is closer to point C?

Anyone who has taken middle school math will know that it must be closer to point A. So let's put it another way, we now have two classes A and B. Class A includes the point (1,1), and class B includes the point (5,5), so for the point (2,2), which class might it belong to?

Since this point is a little closer to a point in category A, it probably belongs in category A. This is the conclusion. Then for 3-dimensional space, class A is point (1,1,1) and class B is (5,5,5), so for point (2,2,2) it must also belong to class A.

As can be seen, we are here taking theDistance between two pointsto serve as a criterion for determining which class it belongs to. So for the 1xn-dimensional vector that we pulled the picture into, he actually projected onto the n-dimensional space is a point, so we divided the training vector into 10 classes, which represent ten numbers, then the recognized number is close to which class, that means it is likely to belong to this class.

So we can assume here that for the recognized vector, the first ten nearest vectors are listed as to which class each belongs to, and then a weight is added based on the rank and a value is calculated. The value represents which class it may belong to, so this is the final result we arrive at - the value of the recognized handwritten digit picture.

These are from the first post, and I'll focus on the math side of things below.

Considering that in some places it's not possible to enter math formulas (or it's not convenient to do so), I'll just post this paragraph as a picture.

After that it's straightforward to pick the first few vector numbers that are closest to the recognized image, which are basically the numbers of the recognized image. But that's a bit simplistic, so I'll go a bit deeper in the next post, this one is about calculating distances first.

master code

In the following code, the folder test is used to store the pictures to be recognized, and the GetTestPicture function is used to get the picture vectors, which are then put into the CalculateResult function to calculate the distance between each of the vectors to be recognized and all the other picture vectors, along with the training picture pic.

# Get the image to be recognized
testFiles = (r"./test/")
testPic = (testFiles)

# Calculate the possible classifications for each image to be recognized
result = (testPic, pic)
for item in result:
 for i in range(n):
  print('First'+str(i+1)+' A vector is '+str(item[i+n])+', the distance is '+str(item[i]))

Function CalculateResult in the file, this file contains two functions for CalculateDistance function and CalculateResult function, representing the algorithm used to recognize the image.

Function CalculateResult

The logic of this function is relatively simple and there's not much to say about it, the main connection is this CalculateDistance function that calculates the distance.

def CalculateResult(test, train):
 '''Calculate the possible classifications of the image to be recognized test'''
 # Get the first n similar images for each image
 testDis = CalculateDistance(test[:,0:N**2], train[:,0:N**2], train[:,N**2], n)
 # Turn testDis into a list
 tt = ()
 # Output all the first n images of each image to be recognized
 for i in tt:
  for j in i:
   print(j)

Function CalculateDistance

In the function I imported four parameters: the identified vector test, the training vector train, the number num that represents each vector corresponding to the training vector, and the first n nearest vectors that I want to derive.

def CalculateDistance(test, train, num, n):
 '''Calculate the first n similar images for each image'''
 # Put the distance in the first n, and the number in the last n.
 dis = (2*n*len(test)).reshape(len(test), 2*n)
 for i, item in enumerate(test):
  # Calculate the distance of each training image from that to-be-recognized image
  itemDis = (((item-train)**2, axis=1))
  # Sort the distances to find the first n
  sortDis = (itemDis)
  dis[i, 0:n] = sortDis[0:n]
  for j in range(n):
   # Find the first few positions in the original matrix
   maxPoint = list(itemDis).index(sortDis[j])
   # Find the number corresponding to the num position and store it in dis
   dis[i, j+n] = num[maxPoint]
 return dis

First create a matrix with the number of rows as the number of identified vectors within the test and the number of columns as 2*n, putting distance in the first n rows and numbers in the last n rows. After that loop for each identified vector.

The distance between each training image and that recognized image is first calculated directly, which can be expressed directly in a single line of code

itemDis = (((item-train)**2, axis=1))

This line of code is the process of the algorithm in the above article, I personally think it is still relatively complex, you can take it apart in detail, I will not go into details here. The following content is to start sorting and find the closest to the first few vectors.

The logic here is: sort first, find the first n with the smallest distance and deposit them in the matrix. Find the position of the first n in the original matrix and find the number of num in the corresponding position and deposit it in the last n of dis.

This way it is equivalent to complete all the content, return dis can be.

Practical test

I wrote some numbers by myself, as shown in the picture. So actually our database is still relatively small.

So I wrote another number as the image to be recognized, and after running it through the program, our to directly output the first ten most similar vectors:

The 1st vector is 2.0 and the distance is 33.62347223932534
The 2nd vector is 2.0 and the distance is 35.64182105224185
The 3rd vector is 2.0 and the distance is 38.69663119274146
The 4th vector is 2.0 and the distance is 43.52904133387693
The 5th vector is 2.0 and the distance is 43.69029199677604
The 6th vector is 1.0 and the distance is 43.730883339256714
The 7th vector is 6.0 and the distance is 44.94800943845918
The 8th vector is 2.0 and the distance is 45.033283944455924
The 9th vector is 4.0 and the distance is 45.43926712996951
The 10th vector is 7.0 and the distance is 45.64893989116544

After that I tried again from 1-9 in turn, my own handwritten numbers all recognized correctly, you can see that the accuracy rate is still quite high. So to do this step is equivalent to a high degree of completion.

So I tried the images I found from the internet and realized that they are hardly correct anymore. That means our database is still too small and only recognizes my fonts. But that being said, it's possible to make a font recognition program.

So if you want to improve accuracy, then expanding the gallery is a must. That's all for this one.

summarize

All the source code I put in myGitHubMedium, check it out if you're interested.

By this point it is equivalent to the algorithmic content written, it is relatively simple and only uses an algorithm similar to K nearest neighbor.

The next post will be about an idea of weighting the top n rankings so as to improve accuracy.

So that's all for now on this one, thanks.

If you like it, please click a like and follow it, thank you!

This article has been featured in Topics"Python Image Processing Operations, you are welcome to click to learn more.

This is the whole content of this article.