SoFunction
Updated on 2024-10-29

Example of python determining whether each line of txt contains a substring and rewriting it to save it

Assuming that you need to batch process multiple txt files, and then will contain the contents of the substring written to a txt file, here assume that my substring for the "_9" and "_10".

Here are two of the lines I'm trying to get at (there are actually a lot more lines haha):

Straight to the code:

#! /usr/bin/python
# -*- coding:UTF-8 -*-
 
import os
import 
import string

The path where the txt file is located and the target path where it needs to be saved (just change it according to your actual directory):

Crop-Ocr_txt folder where I need to batch process all the txt, I in the same directory under a new folder named 1000_simple_Ocrtxts, where the target path is arbitrary, can be easily found on the line!

txt_path = 'D:/youxinProjections/trafic-youxin/MobileNet_v1/obtain_qq_json_new/Crop_Ocr_txt/'
des_txt_path = 'D:/youxinProjections/trafic-youxin/MobileNet_v1/obtain_qq_json_new/1000_simple_OCRtxts/'
 
txt_files = (txt_path) #txt_files gets the filenames of all txt files in the directory.

Define a function dedicated to take the contents of the substring contains and write to a new txt file, in the back of the main function directly call this function on the line:

def select_simples():
  for txtfile in txt_files:
    if not (txtfile):
      in_file = open(txt_path + txtfile, 'r')
      out_file = open(des_txt_path + txtfile, 'a') # here automatically create a new folder and txtfile file name is the same, 'a' for the automatic line feed to write
      lines = in_file.readlines()
      for line in lines:
        str_name = (" ")[0] # Here is the txt file to get the contents of each line separated by spaces in the first element, that is, my own txt file *.jpg that piece of content
str1 = '_9' # This is the substring I'm judging
        str2 = '_10' # It's also a substring #
 
        #if ((str_name, str1)!=-1) or ((str_name, str2)):
        if (str1 in str_name) or (str2 in str_name): # in can determine if there are two substrings contained in str_name.
          out_file.write(line) # If it contains a substring, rewrite the entire contents of the line to a new txt file
          print(str_name)
      out_file.close() 

The main function has arrived! :

if __name__ == '__main__':
select_simples()

A little bit of sunshine on the final result:

Perfection there is not!!!

Supplementary knowledge: python to determine whether there are duplicate lines in the file, read the file line by line to detect the existence of another file read content

I'll cut to the chase, or just look at the code!

#!/bin/env python
# coding:utf-8
# The function of the program is to complete the process of determining whether or not there are duplicate sentences in a document.
# and print out the repeated sentences

res_list = []
f = open('./','r')
res_dup = []
index = 0
file_dul = open('./r_d.txt', 'w')
file_last = open('./r_nd.txt','w')
for line in ():
  index = index + 1
  if line in res_list:
    temp_str = ""
    #temp_str = temp_str + str(index) + ',' #To be changed to str for this to work
    temp_line = ''.join(line)
    temp_str = temp_str+temp_line
    # Eventually it's going to be str.
    file_dul.write(temp_str);         # Duplicates are deposited into the file
  else:
    res_list.append(line)
    file_last.write(line)
#!/bin/env python
# coding:utf-8
import re

res_list = []
f = open('./','r')
f2 = open('./','r')

index = 0
# No duplicate file names
file_dul = open('./m_nd.txt', 'w')
# Duplicate file names
file_ex = open('./m_d.txt', 'w')

virstr = ();
for line in ():
  line=('\n')
  if((line, virstr)):
    line = line + '\n'
    file_ex.write(line);
    #call to delete rm -rf filename
  else:
    line = line+'\n'
    file_dul.write(line);

Above this python judgment txt each line of content whether it contains a substring and rewrite to save the example is all I share with you, I hope to give you a reference, and I hope you support me more.