1. CSV file overview and processing methods
1.1 Basic introduction to CSV file format
CSV (Comma-Separated Values) file is a simple text file format used to store tabular data, where each row represents a record and each field in the row is separated by comma. CSV files are commonly used for data exchange and storage. Its advantages are simplicity, lightweight, easy to read and write, and its disadvantage is the inability to store complex formats and formulas.
For example, a typical CSV file content is as follows:
Name,Age,Gender Alice,25,Female Bob,30,Male Charlie,35,Male
1.2 Use Python's built-in csv module to process CSV files
Python provides built-incsv
Module for reading and writing CSV files. It provides a simple interface to interact directly with files.
Read CSV files
import csv # Open the CSV filewith open('', mode='r') as file: reader = (file) for row in reader: print(row)
Write to CSV file
import csv # Data preparationdata = [['Name', 'Age', 'Gender'], ['Alice', 25, 'Female'], ['Bob', 30, 'Male']] # Write to CSV filewith open('', mode='w', newline='') as file: writer = (file) (data)
Use DictReader and DictWriter
For key-value pair operations, you can useDictReader
andDictWriter
, They allow reading and writing data in dictionary form.
import csv # Read CSV file as a dictionarywith open('', mode='r') as file: reader = (file) for row in reader: print(row) # Write CSV file as a dictionarydata = [{'Name': 'Alice', 'Age': 25, 'Gender': 'Female'}, {'Name': 'Bob', 'Age': 30, 'Gender': 'Male'}] with open('', mode='w', newline='') as file: fieldnames = ['Name', 'Age', 'Gender'] writer = (file, fieldnames=fieldnames) () (data)
1.3 Using pandas to process CSV files
pandas is a powerful data analysis library that provides more advanced and convenient CSV file processing capabilities. It uses the read_csv and to_csv methods to read CSV files directly into DataFrame data structures and supports complex data operations.
Read CSV files
import pandas as pd # Read the CSV file as DataFramedf = pd.read_csv('') print(df)
Write to CSV file
import pandas as pd # Data preparationdata = {'Name': ['Alice', 'Bob'], 'Age': [25, 30], 'Gender': ['Female', 'Male']} df = (data) # Write to CSV filedf.to_csv('', index=False)
Data filtering and operation
# Filter rows older than 30filtered_df = df[df['Age'] > 30] print(filtered_df) # Add a new columndf['Country'] = ['USA', 'UK'] print(df)
2. Excel file overview and processing methods
2.1 Basic introduction to Excel file format
Excel files are file formats for spreadsheets that support tabular data, formulas, charts, and other formatted content. Excel files are available in two common formats:
-
.xls
: Excel 97-2003 file format, based on binary format. -
.xlsx
: The XML basic format used in Excel 2007 and later versions, supporting more functions.
2.2 Use openpyxl to process Excel files
openpyxl
Yes Python is used to read and write Excel.xlsx
Third-party library of files.
Read Excel files
from openpyxl import load_workbook # Load Excel fileswb = load_workbook('') sheet = # Read cell datafor row in sheet.iter_rows(values_only=True): print(row)
Write to Excel file
from openpyxl import Workbook # Create a new Excel filewb = Workbook() sheet = # Write datasheet['A1'] = 'Name' sheet['A2'] = 'Alice' sheet['B1'] = 'Age' sheet['B2'] = 25 # Save Excel files('')
Set cell style
from import Font, Color, Alignment # Set font and alignmentsheet['A1'].font = Font(bold=True, color="FF0000") sheet['A1'].alignment = Alignment(horizontal="center") ('styled_output.xlsx')
2.3 Use xlrd and xlwt to process Excel files
xlrd
Used for reading.xls
file, andxlwt
Used for writing.xls
document.
Read Excel file (xlrd)
import xlrd # Open Excel fileworkbook = xlrd.open_workbook('') sheet = workbook.sheet_by_index(0) # Read datafor row in range(): print(sheet.row_values(row))
Write to Excel file (xlwt)
import xlwt # Create Excel fileworkbook = () sheet = workbook.add_sheet('Sheet1') # Write data(0, 0, 'Name') (0, 1, 'Age') (1, 0, 'Alice') (1, 1, 25) # Save Excel files('')
2.4 Using pandas to process Excel files
pandas
Also provides powerful Excel file processing functions, throughread_excel
andto_excel
Method, which can easily read and write Excel files.
Read Excel files
import pandas as pd # Read Excel file as DataFramedf = pd.read_excel('') print(df)
Write to Excel file
import pandas as pd # Data preparationdata = {'Name': ['Alice', 'Bob'], 'Age': [25, 30], 'Gender': ['Female', 'Male']} df = (data) # Write to Excel filedf.to_excel('', index=False)
3. Comparison and selection of CSV and Excel files
3.1 Similarities and similarities between CSV and Excel
- CSV Files: Simple text files, easy to store and transfer, but cannot save complex formats, formulas and charts. Suitable for storing pure data.
- Excel Files: Supports rich formats, formulas, charts and other functions. Suitable for scenarios where complex formats and calculations are required.
3.2 Select the appropriate file format
- Small data volume and no complex format required: Select the CSV format.
- Need to support formulas, charts or complex formats: Select Excel format.
3.3 Optimize the reading and writing of large data files
- use
pandas
ofchunksize
Parameters read large files in batches. - use
openpyxl
When , avoid loading the entire workbook at once, and loading and saving data in batches.
4. Performance optimization and advanced skills
4.1 Use pandas to optimize the reading and processing of large files
For large data files,pandas
Providedchunksize
Parameters that allow CSV or Excel files to be read by block, thus avoiding loading all data into memory at once.
import pandas as pd chunk_size = 10000 chunks = pd.read_csv('large_file.csv', chunksize=chunk_size) for chunk in chunks: # Process every piece of data print(())
4.2 Cleaning and processing of abnormal data
When processing CSV or Excel files, you often encounter problems such as missing values and duplicate data. usepandas
It is easy to clean data:
# Remove missing values(inplace=True) # Fill in missing values(0, inplace=True) # Remove duplicate datadf.drop_duplicates(inplace=True)
4.3 Batch processing of CSV and Excel files
For processing multiple files, you can useos
The module traverses folders, reads and writes files in batches.
import os import pandas as pd for file in ('csv_files'): if ('.csv'): df = pd.read_csv(f'csv_files/{file}') # Process files df.to_csv(f'processed_{file}', index=False)
5. FAQs and Error Handling
5.1 Handling file encoding issues
When working with CSV files, you may experience coding problems. Can be usedencoding
Parameters specify the encoding format of the file.
df = pd.read_csv('', encoding='utf-8')
5.2 Processing of missing data values
Missing value processing is a common problem in data analysis and can be handled through the dropna and fillna methods provided by pandas.
5.3 Common errors in reading and writing Excel files
Common errors when using openpyxl or pandas to process Excel files include incompatible file formats, corruption of files, etc. You need to make sure that the file path is correct and use the appropriate library to handle the file format.
This is the article about how to use Python to process CSV and Excel files. For more related Python to process CSV and Excel content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!