Text is the basic and advanced knowledge about Python file operations, including reading and writing files, file and directory management, error handling, file path operations, file encoding, processing of large files, temporary files, file permissions, and a simple file search engine example. The advanced section involves file mode, buffering, file locking, advanced file search techniques, file system monitoring, cross-platform file path processing, performance considerations, security, and a further optimized file search engine example.
Base
Read and write files
Sample code:
# Read the filewith open('', 'r') as file: content = () print(content) # Write to a filewith open('', 'w') as file: ('Hello, World!')
No additional installation package required, built-in Pythonopen
Functions can read and write files.
File and directory management
Sample code:
import os import shutil # Create a directory('new_directory') # Rename the directory('new_directory', 'renamed_directory') # Delete the file('old_file.txt') # Copy the file('', '') # List the contents of the directoryprint(('.'))
Package introduction:
-
os
Module: Provides rich methods for processing files and directories. -
shutil
Module: Provides a series of advanced operations on files and file collections.
Error handling
It is important to handle potential errors when doing file operations. For example, trying to open a file that does not exist will raiseFileNotFoundError
. usetry
andexcept
Statements can help you handle these situations gracefully:
try: with open('non_existent_file.txt', 'r') as file: content = () except FileNotFoundError: print("The file does not exist.")
Context Manager
Pythonwith
Statements provide a concise way to manage resources, especially for file operations. usewith
It can ensure that the file is closed correctly after use, even if an exception occurs during file operation.
with open('', 'r') as file: content = () print(content)
File path operation
Althoughos
The module provides basic path operation functions, butpathlib
Modules provide a more object-oriented way to handle file paths. usepathlib
It can make path operation more intuitive and easy to maintain:
from pathlib import Path # Current directory pathcurrent_dir = Path('.') # List all files in the current directoryfor file in current_dir.iterdir(): print(file) # Read the filefile_path = current_dir / '' with file_path.open('r') as file: content = ()
File encoding
When working with text files, it is very important to consider the encoding of the file. By default, Python opens files with the system's default encoding, which can cause problems when porting code between different systems. Specifying the encoding ensures that the file is read and written correctly:
# Open a file using UTF-8 encodingwith open('', 'r', encoding='utf-8') as file: content = ()
Process large files
For very large files, reading their contents at once can consume a lot of memory. Using iterators to read line by line can reduce memory usage:
with open('large_file.txt', 'r') as file: for line in file: process(line) # Process each line
Temporary documents
Sometimes, you may need to create temporary files to store data that are no longer needed after the program is finished.tempfile
The module provides methods to create temporary files and directories:
import tempfile # Create temporary fileswith ('w+t') as temp_file: temp_file.write('Hello, World!') temp_file.seek(0) # Go back to the beginning of the file print(temp_file.read())
File permissions
On Linux and UNIX systems, file permissions are crucial to file security. useos
Module, you can check and modify permissions for files:
import os # Modify file permissions (read-only)('', 0o444)
Comprehensive example – a simple file search engine
A file search engine that allows users to specify a root directory and a file name (or partial file name), and then search for files matching that name in that directory and all its subdirectories.
import os import time def find_files(directory, filename): matches = [] # traverse the root directory for root, dirnames, filenames in (directory): for name in filenames: # Check whether the file name contains search keywords if () in (): ((root, name)) return matches # User inputroot_directory = input("Please enter the root directory to search: ") file_to_find = input("Please enter the file name to search for (substantial matching supports): ") # Record the start timestart_time = () # Search for filesfound_files = find_files(root_directory, file_to_find) # Record the end timeend_time = () # Output resultprint(f"turn up {len(found_files)} A file:") for file in found_files: print(file) # Time to outputprint(f"Time-consuming search: {end_time - start_time:.2f} Second")
This script is used()
Function, this function can traverse all subdirectories in a specified directory. The script adds the full paths of all found matching files to a list and prints those paths after the search is complete.
The user is first prompted to enter the root directory and file name to search for. Then the script will callfind_files
Function to perform searches. The search results will show the number of files found and their paths.
Note that this script is case-insensitive when the filename matches, because it uses.lower()
Method to convert filenames to lowercase. This means that searches are case-insensitive.
$ python3
Please enter the root directory to search: /DB6/project
Please enter the file name to search for (substantial matching supports):
531 files were found:
/DB6/project/blog/BlogSSR/node_modules/@kangc/v-md-editor/src/components/scrollbar/
......
Search time: 46.71 seconds
Advanced
Detailed explanation of file mode
useopen
When using functions, you can open the file through different modes, which determine the read and write permissions and behavior of the file.
# Write mode, if the file exists, overwrite the original contentwith open('', 'w') as file: ('Hello, Python!') # Append mode, the written content will be added to the end of the filewith open('', 'a') as file: ('\nAppend text.') # Binary write modewith open('', 'wb') as file: (b'\x00\xFF')
buffer
Buffering is an important concept in file operations, which affects the timing of data being written to files. Python allows you to control the buffering behavior of files.
# Open files in unbuffered modewith open('', 'r', buffering=0) as file: print(())
File lock
In a multi-threaded or multi-process environment, file locks can be used to avoid data conflicts.
import portalocker with open('', 'a') as file: (file, portalocker.LOCK_EX) ('Locked file.\n') (file)
Advanced file search skills
Combinedand regular expressions can implement complex file search logic.
import os import re def search_files(directory, pattern): regex = (pattern) for root, _, files in (directory): for name in files: if (name): print((root, name)) search_files('.', 'example.*')
File system monitoring
usewatchdog
Library can monitor changes in file systems, which is very useful for applications that need to respond in real time based on file updates.
from import Observer from import LoggingEventHandler event_handler = LoggingEventHandler() observer = Observer() (event_handler, path='.', recursive=True) ()
Cross-platform file path processing
pathlib
Modules provide an object-oriented way to process file paths.
from pathlib import Path p = Path('') with ('r') as file: print(())
Performance considerations
usemmap
Modules can improve the processing efficiency of large files through memory mapping.
import mmap import os with open('', 'r+b') as f: mm = ((), 0) print(()) ()
Security
When dealing with file paths, especially those from users, special care is required to avoid security vulnerabilities.
from pathlib import Path def safe_open(file_path, root_directory): root = Path(root_directory).resolve() absolute_path = (root / file_path).resolve() if root not in absolute_path.parents: raise ValueError("No access to files outside the root directory") return open(absolute_path, 'r') user_path = '../' try: file = safe_open(user_path, '.') print(()) except ValueError as e: print(e)
Comprehensive examples - Further modification of file search engine
import os import re import time from import ThreadPoolExecutor def search_files(directory, pattern): """ Search for files matching regular expressions in the specified directory. """ matches = [] regex = (pattern) for root, dirnames, filenames in (directory): for name in filenames: if (name): ((root, name)) return matches def search_directory(directory, pattern): """ Search for a single directory. """ try: return search_files(directory, pattern) except PermissionError: return [] # Ignore permission errors def main(root_directory, pattern): """ Main function: Search the directory in parallel and summarize the results. """ start_time = () matches = [] # Use ThreadPoolExecutor to search in parallel with ThreadPoolExecutor() as executor: futures = [] for root, dirs, files in (root_directory): for dirname in dirs: future = (search_directory, (root, dirname), pattern) (future) # Wait for all threads to complete and summarize the results for future in futures: (()) end_time = () # Print search results print(f"turn up {len(matches)} A file:") # for match in matches: # print(match) print(f"Time-consuming search: {end_time - start_time:.2f} Second") if __name__ == "__main__": import sys if len() != 3: print("Usage: python search_engine.py [root directory] [search mode]") else: main([1], [2])
-
os
: Used to interact with the operating system, including traversing the directory tree. -
re
: Used for regular expression matching to search for file names by pattern. -
time
: Used to measure the start and end times of the search operation to calculate the total time. -
: Used to parallelize search tasks to improve search efficiency.
search_files function
This function accepts two parameters:directory
(Directory path to search) andpattern
(regular expression pattern) and return a complete list of paths to all files that match the pattern.
- First, create an empty list
matches
to store the matching file path found. - use
(pattern)
Compile regular expression patterns for use in searches. - use
(directory)
Iterates over the specified directory and all its subdirectories. For each directory,Return a triple
(root, dirnames, filenames)
,inroot
is the path to the current directory.dirnames
is a list of names of all subdirectories under this directory.filenames
is a list of names of all files in this directory. - In each directory, iterate through all file names, using regular expressions
.search(name)
Method checks whether the file name matches the given pattern. If match, use the full path to the file (using(root, name)
Build) Add tomatches
in the list. - Function returns
matches
List containing the paths to all found matching files.
search_directory function
This function is encapsulatedsearch_files
Functions to search in a single directory and handle possible occurrencesPermissionError
。
- Accept and
search_files
Same parameters. - Try calling
search_files
Search for functions, if encounteredPermissionError
(For example, because there is not enough permission to access a directory), the exception is caught and an empty list means no matching file was found.
main function
This is the main function of the script, which is responsible for initializing parallel searches, summarizing results, and printing the search time-consuming and found matching files.
- First record the search start time.
- Create an empty list
matches
to store all matching file paths found. - use
ThreadPoolExecutor
Create a thread pool to perform search tasks in parallel. This passes through the root directory and all its subdirectories and submits one for each subdirectoriessearch_directory
The task is implemented in the thread pool. - use
Submit the task and return it
Future
Object added tofutures
in the list. - use
()
Wait for all tasks to complete and collect the results, expanding the matching file path found by each task tomatches
in the list. - Record the search end time and calculate the total time.
- Print the total number of matching files found and the time to search. Commented out sections can be uncommented to print the path to each matching file.
Script entry
- Check the number of command line parameters. If it is not equal to 3 (script name, root directory, and search mode), print the instructions for use.
- If the number of parameters is correct, call
main
Function and pass it into the root directory and search mode.
Run it to see the effect
$ python3 /DB6/project index.*
1409008 files were found:
Search time: 147.67 seconds
The above is the detailed content of using Python to implement a simple file search engine. For more information about Python file search engine, please follow my other related articles!