SoFunction
Updated on 2025-03-10

Python enables efficient reading and writing of large files

The previous article will introduce you to youHow to use Python to read and write files, The question is, how to read and write large files. Is there any way to improve efficiency? Don’t worry, this article will talk about how to read and write large files efficiently in Python.

Here are some ways to efficiently read and write large files in Python:

1. Read large files line by line

def read_large_file_line_by_line(file_path):
    with open(file_path, 'r') as file:
        for line in file:
            # Process each line of data, only print here            print(())
  • with open(file_path, 'r') as file:usewithThe statement opens the file to ensure that the file is automatically closed after use.
  • for line in file: The file object is iterable, reading the file content line by line, avoiding reading the entire file into memory at one time, saving memory space, and is suitable for large text files.

2. Read large files in chunks

def read_large_file_in_chunks(file_path, chunk_size=1024):
    with open(file_path, 'r') as file:
        while True:
            data = (chunk_size)
            if not data:
                break
            # Process the read data blocks, only print here            print(data)
  • (chunk_size): Each time the specified size is read (chunk_size) data blocks, looping until the end of the file.
  • chunk_sizeIt can be adjusted according to the actual situation, and the appropriate value is generally selected based on the file size and available memory.

3. Use the mmap module to perform memory mapped file operations (suitable for large files)

import mmap

def read_large_file_with_mmap(file_path):
    with open(file_path, 'r') as file:
        with ((), 0, access=mmap.ACCESS_READ) as mmap_obj:
            # Process mapped data, only print here            print(mmap_obj.readline())
  • ((), 0, access=mmap.ACCESS_READ): Map files into memory to realize efficient reading and writing of files.fileno()Method gets the file descriptor.
  • Can operate like a stringmmap_obj, avoid frequent file I/O operations and improve performance.

4. Use pandas to process large CSV files (suitable for CSV files)

import pandas as pd

def read_large_csv_in_chunks(csv_file_path):
    chunk_size = 100000  # Number of rows per block    for chunk in pd.read_csv(csv_file_path, chunksize=chunk_size):
        # Process data blocks, only print here        print(chunk)
  • pd.read_csv(csv_file_path, chunksize=chunk_size): Read CSV files in blocks,chunksizeis the number of rows per block.
  • Can be for eachchunkPerform data processing, such as data cleaning, analysis, etc., to avoid loading the entire file at once.

5. Use numpy to process large binary files in chunks (suitable for binary files)

import numpy as np

def read_large_binary_in_chunks(binary_file_path, chunk_size=1024):
    with open(binary_file_path, 'rb') as file:
        while True:
            data = (file, dtype=np.float32, count=chunk_size)
            if  == 0:
                break
            # Process data blocks, only print here            print(data)
  • (file, dtype=np.float32, count=chunk_size): Read binary data from a file,dtypeis the data type,countis the number of elements.
  • Can be adjusted according to the file's storage data typedtype, read binary files by block.

6. Use the itertools module for iterative processing (suitable for text files)

import itertools

def read_large_file_with_itertools(file_path, chunk_size=1024):
    with open(file_path, 'r') as file:
        for chunk in itertools.zip_longest(*[iter(file)]*chunk_size):
            chunk = [() for line in chunk if line]
            # Process data blocks, only print here            print(chunk)

itertools.zip_longest(*[iter(file)]*chunk_size): Group file iterators, each groupchunk_sizeOK, convenient for block processing.

7. Use the linecache module to read large files line by line (suitable for text files)

import linecache

def read_large_file_with_linecache(file_path, line_number):
    line = (file_path, line_number)
    # Process data of the specified row, only print here    print(())

(file_path, line_number): Get data of specified lines from the file, suitable for situations where only certain lines in the file need to be read, avoiding reading the entire file.

Summarize

When processing large files, you can flexibly use the above methods according to file type and operation requirements to avoid loading the entire file into memory at one time, thereby improving the performance and stability of the program. At the same time, it can combine different modules and functions to realize complex data processing and analysis tasks. Okay, save it quickly, you will definitely use it in actual work.

This is the article about Python’s efficient reading and writing large files. For more related contents of Python reading and writing large files, please search for my previous articles or continue browsing the following related articles. I hope everyone will support me in the future!