SoFunction
Updated on 2025-04-20

Method for reading files under nested compressed packages based on Python

Method for reading files under nested compressed packages based on Python

Updated: April 20, 2025 13:37:36 Author: Yuan Yuan Yuan Yuan Yuan Man
Problems encountered in work need to be used to implement nested compressed package file reading in Python. This article introduces detailed solutions to you, and has relevant code examples for your reference. Friends who need it can refer to it.

Ideas

  1. Open the outer layerzipCompress the package and traverse the file
    • usewith (outer_zip_path, 'r') as outer_zipStatement in read mode'r'Open the outer layer of user inputzipCompress the corresponding file of the package, so that the file will be automatically closed after the code block is over to avoid resource leakage.
    • passouter_zip.namelist()Get a list of all files and folder names in the outer compression package and traverse it. For each file name, useif file_name.endswith('.zip')Determine whether it is an inner compressed package (that is, the file name is.zipEnd), if so, then the process of subsequent processing of inner layer compression packages will be entered.
  2. Processing information related to inner layer compression packages
    • First, print the name of the inner layer compressed package to facilitate users to know the current situation of the inner layer compressed package.
    • Then, byinner_zip_data = outer_zip.read(file_name)Read the binary data of the inner compressed packet and then use itwith (BytesIO(inner_zip_data), 'r') as inner_zipThe statement uses the binary data to be readBytesIOSimulate as a temporaryzipThe file object, open again in read mode for subsequent operations.
    • Afterwards, useinner_zip.namelist()Get a list of all file names in the inner compressed package, then traverse it, print out these file names one by one, and show all files contained in the inner compressed package.
  3. Read the file contents in the inner compressed package
    • For each file name in the inner zip package, try to passfile_data = inner_zip.read(inner_file_name)Read binary data of the file.
    • Then, tryutf-8Encoding (assuming the file content isUTF-8Encoding, in practice, the encoding method can be adjusted according to the specific situation) decode the read binary data into text and print it out, throughprint(file_data.decode('utf-8'))Implementation, so that the content of the file can be displayed (if it is a text file).
    • At the same time, usetry-exceptBlocks to catch possible errors:
      • If it appearsUnicodeDecodeError, explainutf-8The encoding cannot correctly decode the file contents. It is very likely that the file is not a text file, and the corresponding prompt message will be printed at this time.
      • If another exception occurs (byexcept Exception as eCapture), a specific error message is printed out, informing the user that other problems occurred while reading the file.

Complete code

Here is a Python code example for entering an outer layerzipCompress the package path, then print the inner compressed package name and all file names under the inner compressed package, and read the file contents in the inner compressed package (here, simply print the read content in text, you can further process it according to the actual file type, such as different processing methods such as pictures and documents). Used in the codezipfileModule to handlezipCompression package:

import zipfile
from io import BytesIO


def process_nested_zips(outer_zip_path):
    with (outer_zip_path, 'r') as outer_zip:
        # traverse all files in the outer compressed package        for file_name in outer_zip.namelist():
            if file_name.endswith('.zip'):
                print(f"Inner compressed package name: {file_name}")
                # Extract the inner compressed package to a temporary directory (here we use BytesIO in memory to simulate a temporary directory, which is only used to obtain information, and can also be extracted to the actual disk directory)                inner_zip_data = outer_zip.read(file_name)
                with (BytesIO(inner_zip_data), 'r') as inner_zip:
                    inner_file_names = inner_zip.namelist()
                    print(f"{file_name} All file names below:")
                    for inner_file_name in inner_file_names:
                        print(inner_file_name)
                        # Read the file contents in the inner compressed package                        try:
                            file_data = inner_zip.read(inner_file_name)
                            print(f"document {inner_file_name} The contents are as follows(Displayed in text,若为非文本document可能显示乱码):")
                            print(file_data.decode('utf-8'))  # Assuming the file content is UTF-8 encoding, it can be adjusted according to the actual situation.                        except UnicodeDecodeError:
                            print(f"document {inner_file_name} Can'tUTF-8Encoding and decoding,可能不是文本document")
                        except Exception as e:
                            print(f"读取document {inner_file_name} Other errors occurred while: {str(e)}")


outer_zip_path = input("Please enter the path of the outer zip compressed package:")
process_nested_zips(outer_zip_path)

Code optimization

If you consider that the file name in the compressed package may have inconsistent encoding, you can optimize the code as follows and add the file name encoding processing part:

import zipfile
from io import BytesIO


def process_nested_zips(outer_zip_path):
    with (outer_zip_path, 'r', encoding='utf-8') as outer_zip:  # Set the outer compressed package file name encoding to utf-8, which can be adjusted according to actual conditions        # traverse all files in the outer compressed package        for file_name in outer_zip.namelist():
            if file_name.endswith('.zip'):
                print(f"Inner compressed package name: {file_name}")
                # Extract the inner compressed package to a temporary directory (here we use BytesIO in memory to simulate a temporary directory, which is only used to obtain information, and can also be extracted to the actual disk directory)                inner_zip_data = outer_zip.read(file_name)
                with (BytesIO(inner_zip_data), 'r', encoding='utf-8') as inner_zip:  # Set the inner layer the same                    inner_file_names = inner_zip.namelist()
                    print(f"{file_name} All file names below:")
                    for inner_file_name in inner_file_names:
                        print(inner_file_name)
                        # Read the file contents in the inner compressed package                        try:
                            file_data = inner_zip.read(inner_file_name)
                            print(f"document {inner_file_name} The contents are as follows(Displayed in text,若为非文本document可能显示乱码):")
                            print(file_data.decode('utf-8'))  # Assuming the file content is UTF-8 encoding, it can be adjusted according to the actual situation.                        except UnicodeDecodeError:
                            print(f"document {inner_file_name} Can'tUTF-8Encoding and decoding,可能不是文本document")
                        except Exception as e:
                            print(f"读取document {inner_file_name} Other errors occurred while: {str(e)}")


outer_zip_path = input("Please enter the path of the outer zip compressed package:")
process_nested_zips(outer_zip_path)

In the above optimized code, when opening the zip file object (in the ZipFile constructor of the outer and inner layers) to utf-8 (the correct encoding method can be determined according to the actual situation, such as some may be GBK, etc.), it is necessary to avoid errors caused by file name encoding problems, so that the program is more robust when handling compressed packages containing different encoded file names. However, accurate judgment and setting the correct encoding may require additional information or further testing and verification.

The above is the detailed content of the method of reading files under nested compressed packages based on Python. For more information about reading files under nested compressed packages in Python, please pay attention to my other related articles!

  • python reading
  • Nesting
  • Compressed package
  • document

Related Articles

  • Python method to implement normalization of datasets (between 0-1)

    Today, the editor will share with you a python method to normalize data sets (between 0-1), which has good reference value and hope it will be helpful to everyone. Let's take a look with the editor
    2018-07-07
  • Summary of four methods to implement palindrome numbers using python

    Today, the editor will share with you a summary of four methods to implement palindrome numbers using python. It has good reference value and hope it will be helpful to everyone. Let's take a look with the editor
    2019-11-11
  • Python calls matplotlib module to draw a bar chart

    This article mainly introduces to python calling the matplotlib module to draw a bar chart. The sample code in the article is introduced in detail and has certain reference value. Interested friends can refer to it.
    2019-10-10
  • Image feature transformation with constant SIFT scale in Python computer vision

    This article mainly introduces the image feature transformation of Python computer vision SIFT scale unchanged. Friends in need can refer to it for reference. I hope it can be helpful. I wish you more progress and get a promotion as soon as possible.
    2022-05-05
  • A brief discussion on the use of copy() method in Python

    This article mainly introduces a brief discussion on the use of the copy() method in Python. Copy in Python is divided into latent copy and deep copy. This article only briefly introduces the usage. Friends who need it can refer to it.
    2015-05-05
  • Sample code for implementing SSH tunneling function in Python

    SSH tunnel uses the SSH protocol to establish an encrypted channel to protect data transmitted through an unsecure network. This article will introduce how to use Python to implement the SSH tunnel function. Those who are interested can learn about it.
    2025-02-02
  • Detailed explanation of the steps and code for building NLP models in Python

    This article mainly introduces the detailed steps for Python to build an NLP model. The sample code in the article is explained in detail, which has certain reference value. Interested friends can follow the editor to learn it.
    2025-03-03
  • Python lightweight performance tool - Locust detailed explanation

    Locust is based on python coroutine mechanism, breaking the limitations of thread processes and can run high concurrency on a test machine. This article mainly introduces python lightweight performance tool - Locust. Friends who need it can refer to it.
    2023-05-05
  • Several calculation methods for Python execution time

    This article mainly introduces several calculation methods of Python execution time. The example code is introduced in this article in detail, which has certain reference learning value for everyone's study or work. Friends who need it, please learn with the editor below.
    2020-07-07
  • Scrapy element selector Xpath usage summary

    This article mainly introduces the summary of the usage of the Scrapy element selector Xpath. The example code is introduced in this article in detail, which has certain reference learning value for everyone's study or work. Friends who need it, please learn with the editor below.
    2021-03-03

Latest Comments