SoFunction
Updated on 2025-03-05

How to handle CSV and Excel files using Python

1. CSV file overview and processing methods

1.1 Basic introduction to CSV file format

CSV (Comma-Separated Values) file is a simple text file format used to store tabular data, where each row represents a record and each field in the row is separated by comma. CSV files are commonly used for data exchange and storage. Its advantages are simplicity, lightweight, easy to read and write, and its disadvantage is the inability to store complex formats and formulas.

For example, a typical CSV file content is as follows:

Name,Age,Gender
Alice,25,Female
Bob,30,Male
Charlie,35,Male

1.2 Use Python's built-in csv module to process CSV files

Python provides built-incsvModule for reading and writing CSV files. It provides a simple interface to interact directly with files.

Read CSV files

import csv
 
# Open the CSV filewith open('', mode='r') as file:
    reader = (file)
    for row in reader:
        print(row)

Write to CSV file

import csv
 
# Data preparationdata = [['Name', 'Age', 'Gender'], ['Alice', 25, 'Female'], ['Bob', 30, 'Male']]
 
# Write to CSV filewith open('', mode='w', newline='') as file:
    writer = (file)
    (data)

Use DictReader and DictWriter

For key-value pair operations, you can useDictReaderandDictWriter, They allow reading and writing data in dictionary form.

import csv
 
# Read CSV file as a dictionarywith open('', mode='r') as file:
    reader = (file)
    for row in reader:
        print(row)
 
# Write CSV file as a dictionarydata = [{'Name': 'Alice', 'Age': 25, 'Gender': 'Female'}, {'Name': 'Bob', 'Age': 30, 'Gender': 'Male'}]
with open('', mode='w', newline='') as file:
    fieldnames = ['Name', 'Age', 'Gender']
    writer = (file, fieldnames=fieldnames)
    ()
    (data)

1.3 Using pandas to process CSV files

pandas is a powerful data analysis library that provides more advanced and convenient CSV file processing capabilities. It uses the read_csv and to_csv methods to read CSV files directly into DataFrame data structures and supports complex data operations.

Read CSV files

import pandas as pd
 
# Read the CSV file as DataFramedf = pd.read_csv('')
print(df)

Write to CSV file

import pandas as pd
 
# Data preparationdata = {'Name': ['Alice', 'Bob'], 'Age': [25, 30], 'Gender': ['Female', 'Male']}
df = (data)
 
# Write to CSV filedf.to_csv('', index=False)

Data filtering and operation

# Filter rows older than 30filtered_df = df[df['Age'] > 30]
print(filtered_df)
 
# Add a new columndf['Country'] = ['USA', 'UK']
print(df)

2. Excel file overview and processing methods

2.1 Basic introduction to Excel file format

Excel files are file formats for spreadsheets that support tabular data, formulas, charts, and other formatted content. Excel files are available in two common formats:

  • .xls: Excel 97-2003 file format, based on binary format.
  • .xlsx: The XML basic format used in Excel 2007 and later versions, supporting more functions.

2.2 Use openpyxl to process Excel files

openpyxlYes Python is used to read and write Excel.xlsxThird-party library of files.

Read Excel files

from openpyxl import load_workbook
 
# Load Excel fileswb = load_workbook('')
sheet = 
 
# Read cell datafor row in sheet.iter_rows(values_only=True):
    print(row)

Write to Excel file

from openpyxl import Workbook
 
# Create a new Excel filewb = Workbook()
sheet = 
 
# Write datasheet['A1'] = 'Name'
sheet['A2'] = 'Alice'
sheet['B1'] = 'Age'
sheet['B2'] = 25
 
# Save Excel files('')

Set cell style

from  import Font, Color, Alignment
 
# Set font and alignmentsheet['A1'].font = Font(bold=True, color="FF0000")
sheet['A1'].alignment = Alignment(horizontal="center")
 
('styled_output.xlsx')

2.3 Use xlrd and xlwt to process Excel files

xlrdUsed for reading.xlsfile, andxlwtUsed for writing.xlsdocument.

Read Excel file (xlrd)

import xlrd
 
# Open Excel fileworkbook = xlrd.open_workbook('')
sheet = workbook.sheet_by_index(0)
 
# Read datafor row in range():
    print(sheet.row_values(row))

Write to Excel file (xlwt)

import xlwt
 
# Create Excel fileworkbook = ()
sheet = workbook.add_sheet('Sheet1')
 
# Write data(0, 0, 'Name')
(0, 1, 'Age')
(1, 0, 'Alice')
(1, 1, 25)
 
# Save Excel files('')

2.4 Using pandas to process Excel files

pandasAlso provides powerful Excel file processing functions, throughread_excelandto_excelMethod, which can easily read and write Excel files.

Read Excel files

import pandas as pd
 
# Read Excel file as DataFramedf = pd.read_excel('')
print(df)

Write to Excel file

import pandas as pd
 
# Data preparationdata = {'Name': ['Alice', 'Bob'], 'Age': [25, 30], 'Gender': ['Female', 'Male']}
df = (data)
 
# Write to Excel filedf.to_excel('', index=False)

3. Comparison and selection of CSV and Excel files

3.1 Similarities and similarities between CSV and Excel

  • CSV Files: Simple text files, easy to store and transfer, but cannot save complex formats, formulas and charts. Suitable for storing pure data.
  • Excel Files: Supports rich formats, formulas, charts and other functions. Suitable for scenarios where complex formats and calculations are required.

3.2 Select the appropriate file format

  • Small data volume and no complex format required: Select the CSV format.
  • Need to support formulas, charts or complex formats: Select Excel format.

3.3 Optimize the reading and writing of large data files

  • usepandasofchunksizeParameters read large files in batches.
  • useopenpyxlWhen  , avoid loading the entire workbook at once, and loading and saving data in batches.

4. Performance optimization and advanced skills

4.1 Use pandas to optimize the reading and processing of large files

For large data files,pandasProvidedchunksizeParameters that allow CSV or Excel files to be read by block, thus avoiding loading all data into memory at once.

import pandas as pd
 
chunk_size = 10000
chunks = pd.read_csv('large_file.csv', chunksize=chunk_size)
for chunk in chunks:
    # Process every piece of data    print(())

4.2 Cleaning and processing of abnormal data

When processing CSV or Excel files, you often encounter problems such as missing values ​​and duplicate data. usepandasIt is easy to clean data:

# Remove missing values(inplace=True)
 
# Fill in missing values(0, inplace=True)
 
# Remove duplicate datadf.drop_duplicates(inplace=True)

4.3 Batch processing of CSV and Excel files

For processing multiple files, you can useosThe module traverses folders, reads and writes files in batches.

import os
import pandas as pd
 
for file in ('csv_files'):
    if ('.csv'):
        df = pd.read_csv(f'csv_files/{file}')
        # Process files        df.to_csv(f'processed_{file}', index=False)

5. FAQs and Error Handling

5.1 Handling file encoding issues

When working with CSV files, you may experience coding problems. Can be usedencodingParameters specify the encoding format of the file.

df = pd.read_csv('', encoding='utf-8')

5.2 Processing of missing data values

Missing value processing is a common problem in data analysis and can be handled through the dropna and fillna methods provided by pandas.

5.3 Common errors in reading and writing Excel files

Common errors when using openpyxl or pandas to process Excel files include incompatible file formats, corruption of files, etc. You need to make sure that the file path is correct and use the appropriate library to handle the file format.

This is the article about how to use Python to process CSV and Excel files. For more related Python to process CSV and Excel content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!