SoFunction
Updated on 2024-10-30

Python use hashlib to realize the file MD5 code batch storage

synopsis

Several possible applications can be considered:

  • If you need to verify or validate a file, you can use MD5 codes to check if the file has been tampered with or corrupted.
  • If you need to categorize or de-duplicate files, you can use MD5 codes to identify the uniqueness or similarity of files.
  • If you need to store or transfer files, you can use an Access database to manage file paths and MD5 codes.

Based on these applications, I have cited the following specific scenarios for you:

You are a software developer and you need to distribute an installation package for users to download. To ensure the integrity and security of the installation package, you can use just a piece of code to generate the MD5 code of the installation package and store it in the Access database. After users download the package, they can compare the MD5 code they calculated with the MD5 code in the database to confirm whether the package is correct or not.

You are a data analyst and you need to work with a large number of data files. To avoid duplicate or incorrect data files, you can use just a piece of code to generate the MD5 code for each data file and store it in the Access database. When you need to query or analyze a data file, you can quickly locate the corresponding data file by finding the MD5 code in the database.

You are a network administrator and you need to backup important files on your server. To save space and time, you can use just a snippet of code to generate the MD5 code for each important file and store it in an Access database. When you need to restore an important file, you can determine which files need to be updated or overwritten by comparing the MD5 codes on the server and on the backup device.

source code (computing)

import os
import hashlib
import pyodbc
 
# Connect to the Access database
conn = (r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=./;')
# conn = (r'Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=;')
cursor = ()
 
# Iterate over all files in the current folder
for file in ("."):
    # Skip subdirectories
    if (file):
        continue
    # Get the full file path
    file_path = (file)
    # Generate the md5 hash of the file content
    md5_hash = hashlib.md5()
    with open(file_path, "rb") as f:
        for chunk in iter(lambda: (4096), b""):
            md5_hash.update(chunk)
    md5_hex = md5_hash.hexdigest()
    # Insert the file path and md5 hash into the database table
    ("INSERT INTO filemd (filepath, md5) VALUES (?, ?)", (file_path, md5_hex))
 
# Commit and close the connection
()
()

Source Code Description

The main function of this code is to iterate through all the files in the current folder, calculate the MD5 code for each file, and store the file path and MD5 code in an Access database. Specifically, this code does the following:

  • Import the os, hashlib, and pyodbc modules for operating systems, hash algorithms, and database connections, respectively.
  • Use the function to connect to an Access database, specifying the driver and database file name.
  • Creates a cursor object for executing SQL statements.
  • Use the function to get the names of all files in the current folder.
  • Use a for loop to iterate through each filename.
  • Use the function to determine if it is a subdirectory and skip it if it is.
  • Use the function to get the full file path.
  • Create an md5_hash object for generating MD5 codes.
  • Use the open function to open the file in binary mode and use a for loop to read each 4096 byte block of data.
  • Use the md5_hash.update function to update the MD5 code computation status.
  • Use the md5_hash.hexdigest function to get the final MD5 code value (hexadecimal string).
  • Use the function to execute the SQL statement to insert the file path and MD5 code into the filemd table (if the table does not exist, create it first).
  • Use the function to commit the transaction and save the data to the database.
  • Use the function to close the database connection.

The effect is shown below

to this article on the use of Python hashlib file MD5 code batch storage article is introduced to this, more related Python hashlib file MD5 code storage content please search my previous posts or continue to browse the following related articles I hope that you will support me more in the future!