Preface
Python provides a variety of ways to read Excel files, the most commonly used libraries arepandas
andopenpyxl
. Below I will explain in detail how to use these two libraries to read Excel files and include some practical examples to help you write a blog.
1. Install the necessary dependencies
First, make sure to install itpandas
andopenpyxl
Library, these two libraries can help us read Excel files easily.
You can install them using the following command:
pip install pandas openpyxl
-
pandas
: For data processing and analysis, it has built-in function to read Excel files. -
openpyxl
: A library that reads and writes Excel files in Excel 2007 and above.
2. Read Excel file
Suppose you have an Excel file, including the following data:
Name | Age | City |
---|---|---|
John | 25 | New York |
Alice | 30 | London |
Bob | 22 | Tokyo |
You can usepandas
To read the contents of the file.
import pandas as pd # Read Excel filedf = pd.read_excel('') # Show dataprint(df)
Output:
Name Age City
0 John 25 New York
1 Alice 30 London
2 Bob 22 Tokyo
3. Read a specific worksheet
Excel files usually contain multiple worksheets. Can be specifiedsheet_name
to read a specific worksheet. For example, assumingThere is a name in the file
Sheet2
Worksheet:
df = pd.read_excel('', sheet_name='Sheet2') print(df)
You can also specify worksheets through indexes:
df = pd.read_excel('', sheet_name=0) # Read the first worksheet
4. Read multiple worksheets
If you want to read all sheets in Excel files at once, you can usesheet_name=None
, it returns a containing multipleDataFrame
dictionary.
dfs = pd.read_excel('', sheet_name=None) # traverse all worksheetsfor sheet, data in (): print(f"Worksheet name: {sheet}") print(data)
5. Read only specific columns
Sometimes, you just need to read some columns in an Excel file. Can be passedusecols
The parameter specifies the column to be read.
df = pd.read_excel('', usecols=['Name', 'City']) print(df)
Output:
Name City
0 John New York
1 Alice London
2 Bob Tokyo
You can also use the column index to read specific columns, for example:
df = pd.read_excel('', usecols=[0, 2]) # Read the first and third columnsprint(df)
6. Process missing data
When reading Excel data, you may encounter blank cells. You can usepandas
Process these missing data. For example, it can be done byna_values
The parameter recognizes a specific value asNaN
, or usefillna()
Method fills in missing values.
# Identify a specific value as NaNdf = pd.read_excel('', na_values=['N/A', 'NA']) # Fill in missing data(value={'Age': 0, 'City': 'Unknown'}, inplace=True) print(df)
7. Convert Excel data to other formats
Sometimes you may need to save the read Excel data in other formats, such as a CSV file or a JSON file.pandas
Allows you to achieve this easily.
# Save as CSV filedf.to_csv('', index=False) # Save as JSON filedf.to_json('', orient='records')
8. Read large files and optimize performance
If the Excel file is very large, it may cause insufficient memory or slow reading.pandas
Provides some methods to optimize performance, such as usingchunksize
Read data in chunks.
# Read Excel files in chunks, reading 100 lines each timechunks = pd.read_excel('large_data.xlsx', chunksize=100) for chunk in chunks: print(chunk)
9. Read Excel using openpyxl
openpyxl
It is more suitable for scenarios where the underlying operations of Excel files are required, such as reading and modifying cell styles, formulas, etc. Here is a simple reading example:
from openpyxl import load_workbook # Load Excel workbookwb = load_workbook('') # Select a worksheetsheet = wb['Sheet1'] # Read the value of the specified cellfor row in sheet.iter_rows(min_row=1, max_row=sheet.max_row, values_only=True): print(row)
10. More Excel read functions
You can usepandas
More options available, such as:
-
skiprows
: Skip a specific number of rows -
nrows
: Read the specified number of rows -
header
: Set custom title line
# Skip the first two lines and read 5 lines of datadf = pd.read_excel('', skiprows=2, nrows=5) print(df)
Summarize
passpandas
andopenpyxl
, you can easily read Excel files and perform various data processing operations.pandas
More suitable for fast and simple data analysis,openpyxl
It is suitable for scenarios where more in-depth control of Excel files is required.
This is the end of this article about the detailed explanation of Python's method of reading Excel table data. For more related Python's content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!