Method for reading files under nested compressed packages based on Python
Ideas
-
Open the outer layer
zip
Compress the package and traverse the file:- use
with (outer_zip_path, 'r') as outer_zip
Statement in read mode'r'
Open the outer layer of user inputzip
Compress the corresponding file of the package, so that the file will be automatically closed after the code block is over to avoid resource leakage. - pass
outer_zip.namelist()
Get a list of all files and folder names in the outer compression package and traverse it. For each file name, useif file_name.endswith('.zip')
Determine whether it is an inner compressed package (that is, the file name is.zip
End), if so, then the process of subsequent processing of inner layer compression packages will be entered.
- use
-
Processing information related to inner layer compression packages:
- First, print the name of the inner layer compressed package to facilitate users to know the current situation of the inner layer compressed package.
- Then, by
inner_zip_data = outer_zip.read(file_name)
Read the binary data of the inner compressed packet and then use itwith (BytesIO(inner_zip_data), 'r') as inner_zip
The statement uses the binary data to be readBytesIO
Simulate as a temporaryzip
The file object, open again in read mode for subsequent operations. - Afterwards, use
inner_zip.namelist()
Get a list of all file names in the inner compressed package, then traverse it, print out these file names one by one, and show all files contained in the inner compressed package.
-
Read the file contents in the inner compressed package:
- For each file name in the inner zip package, try to pass
file_data = inner_zip.read(inner_file_name)
Read binary data of the file. - Then, try
utf-8
Encoding (assuming the file content isUTF-8
Encoding, in practice, the encoding method can be adjusted according to the specific situation) decode the read binary data into text and print it out, throughprint(file_data.decode('utf-8'))
Implementation, so that the content of the file can be displayed (if it is a text file). - At the same time, use
try-except
Blocks to catch possible errors:- If it appears
UnicodeDecodeError
, explainutf-8
The encoding cannot correctly decode the file contents. It is very likely that the file is not a text file, and the corresponding prompt message will be printed at this time. - If another exception occurs (by
except Exception as e
Capture), a specific error message is printed out, informing the user that other problems occurred while reading the file.
- If it appears
- For each file name in the inner zip package, try to pass
Complete code
Here is a Python code example for entering an outer layerzip
Compress the package path, then print the inner compressed package name and all file names under the inner compressed package, and read the file contents in the inner compressed package (here, simply print the read content in text, you can further process it according to the actual file type, such as different processing methods such as pictures and documents). Used in the codezipfile
Module to handlezip
Compression package:
import zipfile from io import BytesIO def process_nested_zips(outer_zip_path): with (outer_zip_path, 'r') as outer_zip: # traverse all files in the outer compressed package for file_name in outer_zip.namelist(): if file_name.endswith('.zip'): print(f"Inner compressed package name: {file_name}") # Extract the inner compressed package to a temporary directory (here we use BytesIO in memory to simulate a temporary directory, which is only used to obtain information, and can also be extracted to the actual disk directory) inner_zip_data = outer_zip.read(file_name) with (BytesIO(inner_zip_data), 'r') as inner_zip: inner_file_names = inner_zip.namelist() print(f"{file_name} All file names below:") for inner_file_name in inner_file_names: print(inner_file_name) # Read the file contents in the inner compressed package try: file_data = inner_zip.read(inner_file_name) print(f"document {inner_file_name} The contents are as follows(Displayed in text,若为非文本document可能显示乱码):") print(file_data.decode('utf-8')) # Assuming the file content is UTF-8 encoding, it can be adjusted according to the actual situation. except UnicodeDecodeError: print(f"document {inner_file_name} Can'tUTF-8Encoding and decoding,可能不是文本document") except Exception as e: print(f"读取document {inner_file_name} Other errors occurred while: {str(e)}") outer_zip_path = input("Please enter the path of the outer zip compressed package:") process_nested_zips(outer_zip_path)
Code optimization
If you consider that the file name in the compressed package may have inconsistent encoding, you can optimize the code as follows and add the file name encoding processing part:
import zipfile from io import BytesIO def process_nested_zips(outer_zip_path): with (outer_zip_path, 'r', encoding='utf-8') as outer_zip: # Set the outer compressed package file name encoding to utf-8, which can be adjusted according to actual conditions # traverse all files in the outer compressed package for file_name in outer_zip.namelist(): if file_name.endswith('.zip'): print(f"Inner compressed package name: {file_name}") # Extract the inner compressed package to a temporary directory (here we use BytesIO in memory to simulate a temporary directory, which is only used to obtain information, and can also be extracted to the actual disk directory) inner_zip_data = outer_zip.read(file_name) with (BytesIO(inner_zip_data), 'r', encoding='utf-8') as inner_zip: # Set the inner layer the same inner_file_names = inner_zip.namelist() print(f"{file_name} All file names below:") for inner_file_name in inner_file_names: print(inner_file_name) # Read the file contents in the inner compressed package try: file_data = inner_zip.read(inner_file_name) print(f"document {inner_file_name} The contents are as follows(Displayed in text,若为非文本document可能显示乱码):") print(file_data.decode('utf-8')) # Assuming the file content is UTF-8 encoding, it can be adjusted according to the actual situation. except UnicodeDecodeError: print(f"document {inner_file_name} Can'tUTF-8Encoding and decoding,可能不是文本document") except Exception as e: print(f"读取document {inner_file_name} Other errors occurred while: {str(e)}") outer_zip_path = input("Please enter the path of the outer zip compressed package:") process_nested_zips(outer_zip_path)
In the above optimized code, when opening the zip file object (in the ZipFile constructor of the outer and inner layers) to utf-8 (the correct encoding method can be determined according to the actual situation, such as some may be GBK, etc.), it is necessary to avoid errors caused by file name encoding problems, so that the program is more robust when handling compressed packages containing different encoded file names. However, accurate judgment and setting the correct encoding may require additional information or further testing and verification.
The above is the detailed content of the method of reading files under nested compressed packages based on Python. For more information about reading files under nested compressed packages in Python, please pay attention to my other related articles!
Related Articles
Python method to implement normalization of datasets (between 0-1)
Today, the editor will share with you a python method to normalize data sets (between 0-1), which has good reference value and hope it will be helpful to everyone. Let's take a look with the editor2018-07-07Summary of four methods to implement palindrome numbers using python
Today, the editor will share with you a summary of four methods to implement palindrome numbers using python. It has good reference value and hope it will be helpful to everyone. Let's take a look with the editor2019-11-11Python calls matplotlib module to draw a bar chart
This article mainly introduces to python calling the matplotlib module to draw a bar chart. The sample code in the article is introduced in detail and has certain reference value. Interested friends can refer to it.2019-10-10Image feature transformation with constant SIFT scale in Python computer vision
This article mainly introduces the image feature transformation of Python computer vision SIFT scale unchanged. Friends in need can refer to it for reference. I hope it can be helpful. I wish you more progress and get a promotion as soon as possible.2022-05-05A brief discussion on the use of copy() method in Python
This article mainly introduces a brief discussion on the use of the copy() method in Python. Copy in Python is divided into latent copy and deep copy. This article only briefly introduces the usage. Friends who need it can refer to it.2015-05-05Sample code for implementing SSH tunneling function in Python
SSH tunnel uses the SSH protocol to establish an encrypted channel to protect data transmitted through an unsecure network. This article will introduce how to use Python to implement the SSH tunnel function. Those who are interested can learn about it.2025-02-02Detailed explanation of the steps and code for building NLP models in Python
This article mainly introduces the detailed steps for Python to build an NLP model. The sample code in the article is explained in detail, which has certain reference value. Interested friends can follow the editor to learn it.2025-03-03Python lightweight performance tool - Locust detailed explanation
Locust is based on python coroutine mechanism, breaking the limitations of thread processes and can run high concurrency on a test machine. This article mainly introduces python lightweight performance tool - Locust. Friends who need it can refer to it.2023-05-05Several calculation methods for Python execution time
This article mainly introduces several calculation methods of Python execution time. The example code is introduced in this article in detail, which has certain reference learning value for everyone's study or work. Friends who need it, please learn with the editor below.2020-07-07Scrapy element selector Xpath usage summary
This article mainly introduces the summary of the usage of the Scrapy element selector Xpath. The example code is introduced in this article in detail, which has certain reference learning value for everyone's study or work. Friends who need it, please learn with the editor below.2021-03-03