SoFunction
Updated on 2024-10-28

python opencv split and recognize table images by table frame lines

python opencv split and recognize table images by table frame lines

Updated October 30, 2019 09:32:32 by HelloWorld!
This article introduces the python opencv will form the picture in accordance with the form of box line segmentation and identification, the text through the sample code is very detailed, for everyone to learn or work with a certain reference value of learning, the need for friends below with the editorial to learn together!

The following small program for the use of python + opencv will be a form of pictures, according to the form of segmentation, and recognition of segmented sub-pictures of the text, I hope that the need for some partners have some help. Specific implementation of the following code.

# -*- coding: utf-8 -*-
"""
Created on Tue May 28 19:23:19 2019
Split image into subimages according to table box line intersections (pass in image path)
@author: hx
"""
 
import cv2
import numpy as np
import pytesseract
 
image = ('C:/Users/Administrator/Desktop/', 1)
# Grayscale images
gray = (image, cv2.COLOR_BGR2GRAY)
# Binarization
binary = (~gray, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C, cv2.THRESH_BINARY, 35, -5)
#ret,binary = (~gray, 127, 255, cv2.THRESH_BINARY)
("Binarized Pictures:", binary) #Show Pictures
(0)
 
rows,cols=
scale = 40
#Identify the horizontal line
kernel = (cv2.MORPH_RECT,(cols//scale,1))
eroded = (binary,kernel,iterations = 1)
#("Eroded Image",eroded)
dilatedcol = (eroded,kernel,iterations = 1)
("Tabular Horizontal Line Presentation:",dilatedcol)
(0)
 
#Identify vertical lines
scale = 20
kernel = (cv2.MORPH_RECT,(1,rows//scale))
eroded = (binary,kernel,iterations = 1)
dilatedrow = (eroded,kernel,iterations = 1)
("Table Vertical Line Display:",dilatedrow)
(0)
 
#Mark the intersection
bitwiseAnd = cv2.bitwise_and(dilatedcol,dilatedrow)
("Table Intersection Display:",bitwiseAnd)
(0)
# ("",bitwiseAnd) #Generate an image from binary pixel points to be saved
 
# Logo Forms
merge = (dilatedcol,dilatedrow)
("Overall presentation of the table:",merge)
(0)
 
 
#Two images subtracted to remove table box lines
merge2 = (binary,merge)
("Pictures to remove table frame line display:",merge2)
(0)
 
#Identify white intersections in black and white plots, take out the horizontal and vertical coordinates
ys,xs = (bitwiseAnd>0)
 
mylisty=[] # vertical coordinate
mylistx=[] # Horizontal coordinates
 
# Get the x and y values of the jumps by sorting them to show that they are intersections, otherwise the intersections would have so many pixel values with similar values that I would just take the last point of the similar values
# This 10 jump is not fixed, depending on the picture will be fine-tuned, basically for the height of the cell form (y-coordinate jump) and length (x-coordinate jump)
i = 0
myxs=(xs)
for i in range(len(myxs)-1):
  if(myxs[i+1]-myxs[i]>10):
    (myxs[i])
  i=i+1
(myxs[i]) #To add the last point
 
 
i = 0
myys=(ys)
#print((ys))
for i in range(len(myys)-1):
  if(myys[i+1]-myys[i]>10):
    (myys[i])
  i=i+1
(myys[i]) #To add the last point
 
print('mylisty',mylisty)
print('mylistx',mylistx)
 
 
# Loop y-coordinate, x-coordinate split table
for i in range(len(mylisty)-1):
  for j in range(len(mylistx)-1):
    # In segmentation, the first parameter is the y-coordinate and the second parameter is the x-coordinate
    ROI = image[mylisty[i]+3:mylisty[i+1]-3,mylistx[j]:mylistx[j+1]-3] The reason for the #minus 3 is because I narrowed down the ROI range
    ("Segmented sub-picture display:",ROI)
    (0)
 
    #special_char_list = '`~!@#$%^&*()-_=+[]{}|\\;:‘',。《》/?ˇ'
    .tesseract_cmd = 'E:/Tesseract-OCR/'
    text1 = pytesseract.image_to_string(ROI) #Read text, this is the default English
    #text2 = ''.join([char for char in text2 if char not in special_char_list])
    print('Recognize segmented sub-picture information as:'+text1)
    j=j+1
  i=i+1
    

This is the whole content of this article.

  • python
  • opencv
  • tabular
  • demerger
  • recognize

Related articles

  • Python using scapy to simulate packets to achieve arp attacks, dns amplification attack example

    This article introduces the use of Python scapy simulation packets to achieve arp attacks, dns amplification attack examples, this article focuses on the use of scapy, the need for friends can refer to the next!
    2014-10-10
  • Matplotlib animation module to implement dynamic diagrams

    This article introduces the Matplotlib animation module to achieve the dynamic map, the text through the sample code is very detailed, for everyone's learning or work has a certain reference learning value, the need for friends below with the editorial to learn together!
    2021-02-02
  • python basic tutorial project three of the universal XML

    This article is mainly for you to introduce the python basic tutorial project three of the universal XML, with certain reference value, interested partners can refer to it
    2018-04-04
  • Python Operator Overloading Explained and Example Code

    This article introduces the Python operator overloading details and example code, you can refer to the following
    2017-03-03
  • Django docking Alipay to realize Alipay recharge gold coin function example

    Today, I'd like to share a Django docking Alipay to achieve Alipay recharge gold coin function example, has a good reference value, I hope to help you. Together follow the editor over to see it
    2019-12-12
  • python decompile exe file to py file example code

    This article introduces the python decompile exe file for py file example code, very good, with some reference value, need friends can refer to the following
    2019-06-06
  • Python automatically creates Markdown tables to explore the use of example

    Markdown table is one of the important ways to organize and display data in the document, however, manually write a large form may be time-consuming and error-prone, this article will introduce how to use Python to automatically create a Markdown table, through the sample code to show in detail a variety of scenarios under the creation of methods to improve the efficiency of the form generation
    2024-01-01
  • python implementation of converting a read multidimensional list to a one-dimensional list

    Today I will share with you a python implementation of the read multi-dimensional list into a one-dimensional list, has a good reference value, I hope to help you. Together follow the editor over to see it
    2018-06-06
  • Steps to implement a python manipulation mobile app

    This article mainly introduces the python operation of cell phone app implementation steps, this article will be combined with examples of code, has a certain reference value, interested partners can refer to it
    2021-07-07
  • python opencv Implementation of Simple Thresholding Algorithm

    This article introduces the realization of python opencv simple threshold algorithm, the text of the sample code through the introduction of the very detailed, for everyone to learn or work with a certain reference to the value of learning, the need for friends below with the editorial to learn together!
    2019-08-08

Latest Comments