SoFunction
Updated on 2024-10-28

Python sample code to automate the organization of the file

Automated organization of computer files

Automatic file categorization, quick finding of files and folders, cleaning of duplicate files, conversion of image formats and other common tasks are done through Python programming.

1. Automatic classification of documents

Categorize and organize files into different folders according to their extensions.

Using os and shutil modules

The os module provides a number of functions for manipulating files and folders, allowing you to create, delete, view attributes and find paths to files or folders.

The shutil module provides functions to move, copy, compress, etc. files or folders.

"""
The os module provides a number of functions for manipulating files and folders, which allow you to create, delete, view attributes, and find paths to files or folders.
The shutil module provides functions to move, copy, and compress files and folders.
"""
import os
import shutil

# Directory of source files
src_dir = "Documents to be classified/"
# Directory of the output file
output_dir = 'Classified documents/'
files = (src_dir)  # List the names of all files and subfolders in the src_dir directory
print(files)
for f in files:
    # Get the path
    src_path = src_dir + f
    # Determine if it's a file
    if (src_path):
        # Get the file suffix, splice it with the output directory to form the output folder path
        output_path = output_dir + ('.')[-1]
        # Determine if the output folder exists, if not it needs to be created
        if not (output_path):
            (output_path)
        # Move the file to the folder corresponding to its extension in the output directory
        (src_path, output_path)

Test file:

Effect:

Using the pathlib module

from pathlib import Path

# Directory of source files
src_dir_name = "Documents to be classified/"
# Directory of the output file
output_dir_name = 'Classified documents/'

# Use the Path() function to create path objects for the source and destination folders
src_dir = Path(src_dir_name)
output_dir = Path(output_dir_name)

# Find files and subfolders in the source folder, * means return all files and subfolders (full path)
files = src_dir.glob('*')
for f in files:
    # Determine if a path represents a file
    if f.is_file():
        # Get the output folder path
        output_path = output_dir / ('.')
        # Determine if the output folder exists
        if not output_path.exists():
            # Create if not present, parents is True for creating multi-level folders
            output_path.mkdir(parents=True)
        # Rename a file path to a given path to enable file movement
        (output_path / )

2. Quick search of files and folders

Write fast find file and folder programs using python for exact and fuzzy find.

Find files and folders with precision

from pathlib import Path

while True:
    folder = input("Please enter the path to the search directory (e.g., :D:\\\):")
    folder = Path(())  # Use the Path() function to create a path object
    # Determine if the input path exists and if it is a directory
    if () and folder.is_dir():
        break
    else:
        print("The path entered was incorrect, please re-enter!")
search_word = input("Please enter the name of the file or folder to be found:").strip()  # Get the name of the file or folder entered, stripping out the leading and trailing spaces
"""
glob()functions andrglob()Difference between functions:
glob()functions andrglob()functions can use wildcards to find files and subfolders in a given path.。
The difference is that:
    glob()function only performs as well as a lookup,(indicates contrast)rglob()The function does a multilevel lookup。
"""
# Use the rglob() function to find files and folders with names identical to the specified keywords in the path entered by the user and convert the results into a list.
results = list((pattern=search_word))
if len(results) != 0:
    print(f'exist【{folder}】The following results were found under:')
    for r in results:
        print(r)
else:
    print(f'exist【{folder}】I did not find a file named【{search_word}】file or folder!')

Effect:

Fuzzy Find Files and Folders

# author:mlnt
# createdate:2022/8/23
from pathlib import Path

while True:
    folder = input("Please enter the path to the search directory (e.g., :D:\\\):")
    folder = Path(())  # Use the Path() function to create a path object
    # Determine if the input path exists and if it is a directory
    if () and folder.is_dir():
        break
    else:
        print("The path entered was incorrect, please re-enter!")
search_word = input("Please enter the name of the file or folder to be found:").strip()  # Get the name of the file or folder entered, stripping out the leading and trailing spaces
"""
glob()functions andrglob()Difference between functions:
glob()functions andrglob()functions can use wildcards to find files and subfolders in a given path.。
The difference is that:
    glob()function only performs as well as a lookup,(indicates contrast)rglob()The function does a multilevel lookup。
"""
# Use the rglob() function to find files and folders with names identical to the specified keywords in the path entered by the user and convert the results into a list.
results = list((pattern=f'*{search_word}*'))
if len(results) == 0:
    print(f'exist【{folder}】No names were found under the【{search_word}】file or folder!')
else:
    result_folders = []  # Keyword-related folders found
    result_files = []   # Documents related to keywords
    for r in results:
        if r.is_dir():
            # If directory (folder), add to folder list
            result_folders.append(r)
        else:
            result_files.append(r)
    if len(result_folders) != 0:
        print(f'exist【{folder}】The keywords found under the{search_word}Related folders:')
        for f in result_folders:
            print(f)
    if len(result_files) != 0:
        print(f'exist【{folder}】The keywords found under the{search_word}The relevant documents are as follows:')
        for f in result_files:
            print(f)

Effect:

3. Automatic clean-up of duplicate documents

Steps to implement automatic file cleanup:

1. List all files in the specified folder;

2. Compare the contents of the documents two by two;

3. If the contents are the same, move one of the files to the specified folder

"""
Steps to implement automatic file cleanup:
1. list all files under the specified folder;
2. compare the contents of the files two by two to see if they are the same;
3. if the content is the same, move one of the files to the specified folder
"""

# Import the Path() function from the pathlib module
from pathlib import Path
# Import the cmp() function from the filecmp module to do file comparisons
from filecmp import cmp

input_dir = 'Documents to be processed'
output_dir = 'Duplicate documents'
# Create the Path object
src_folder = Path(input_dir)
output_folder = Path(output_dir)
# Determine if the output directory exists
if not output_folder.exists():
    # Create directories if they don't exist (multi-level creation)
    output_folder.mkdir(parents=True)

results = list(src_folder.glob('*'))  # List files and subfolders in a given directory
file_list = []
for r in results:
    # Determine if a path points to a file
    if r.is_file():
        # Yes add to file list
        file_list.append(r)

# Iterate through the list of documents and compare
for i in file_list:
    for j in file_list:
        if i != j and () and ():
            # Compare two files to see if they are the same
            if cmp(i, j):
                # If two files are the same, move one of them to the specified folder
                # Delete duplicate files()
                (output_folder / )

Test file:

Effect:

4. Batch conversion of image formats

from pathlib import Path
from PIL import Image


input_dir = 'input_images'
output_dir = 'output_images'
# Create the Path object
src_folder = Path(input_dir)
output_folder = Path(output_dir)
# Determine if the output directory exists
if not output_folder.exists():
    # Create directories if they don't exist (multi-level creation)
    output_folder.mkdir(parents=True)

file_list = list(src_folder.glob('*[.jpg|.jpeg]'))  # Find images with a jpg or jpeg suffix
for f in file_list:
    output_file = output_folder / 
    # Replace the path extension
    output_file = output_file.with_suffix('.png')
    # Save the image to the specified path
    (f).save(output_file)
    print(f'{}-->Format conversion complete!')

Test file:

Effect:

5. Automatic sorting of pictures by date of shooting

The exifread module needs to be installed:

pip install exifread

Steps:

1. List all the pictures in the specified folder;

2. Read the EXIF (Exchangeable Image File Format) information of the picture and extract the shooting date;

3. Convert the shooting date to the desired format and then create a folder using the shooting date;

4. Move the pictures to the folder corresponding to the shooting date.

"""
Steps:
    1.List all the pictures under the specified folder;
    2. read the EXIF (Exchangeable Image File Format) information of the pictures and extract the shooting date; pip install exifread
    3. Convert the shooting date to the desired format, and then create a folder with the shooting date;
    4. Move the pictures to the folder corresponding to the shooting date.
"""
from pathlib import Path
from datetime import datetime
from exifread import process_file

input_dir = 'input_images'
output_dir = 'output_dir'
# Create the Path object
src_folder = Path(input_dir)
output_folder = Path(output_dir)
# Determine if the output directory exists
if not output_folder.exists():
    # Create directories if they don't exist (multi-level creation)
    output_folder.mkdir(parents=True)

# Find images with a jpg or jpeg suffix
file_list = list(src_folder.glob('*[.jpg|.jpeg]'))
for f in file_list:
    with open(f, 'rb') as fp:
        # Read the EXIF information of an image
        # process_file function returns the read EXIF information in dictionary format
        tags = process_file(fp, details=False)
    # Determine if a shooting date is in the dictionary
    if 'EXIF DateTimeOriginal' in ():
        dto = str(tags['EXIF DateTimeOriginal'])
        # Convert shooting date to desired format as folder name
        folder_name = (dto, '%Y:%m:%d %H:%M:%S').strftime('%Y-%m-%d')
        # Set the path to the output directory
        output_path = output_folder / folder_name
        if not output_path.exists():
            output_path.mkdir(parents=True)
        # Move pictures to the folder corresponding to the date they were taken
        (output_path / )

Test file:

Effect:

Above is Python to achieve automation of organizing files of the sample code in detail, more information about Python organizing files please pay attention to my other related articles!