SoFunction
Updated on 2025-03-02

Detailed explanation of how Python uses pandas and openpyxl to read Excel tables

Preface

Python provides a variety of ways to read Excel files, the most commonly used libraries arepandasandopenpyxl. Below I will explain in detail how to use these two libraries to read Excel files and include some practical examples to help you write a blog.

1. Install the necessary dependencies

First, make sure to install itpandasandopenpyxlLibrary, these two libraries can help us read Excel files easily.

You can install them using the following command:

pip install pandas openpyxl

  • pandas: For data processing and analysis, it has built-in function to read Excel files.
  • openpyxl: A library that reads and writes Excel files in Excel 2007 and above.

2. Read Excel file

Suppose you have an Excel file, including the following data:

Name Age City
John 25 New York
Alice 30 London
Bob 22 Tokyo

You can usepandasTo read the contents of the file.

import pandas as pd
# Read Excel filedf = pd.read_excel('')
# Show dataprint(df)

Output:

    Name  Age      City
0   John   25  New York
1  Alice   30    London
2    Bob   22     Tokyo

3. Read a specific worksheet

Excel files usually contain multiple worksheets. Can be specifiedsheet_nameto read a specific worksheet. For example, assumingThere is a name in the fileSheet2Worksheet:

df = pd.read_excel('', sheet_name='Sheet2')
print(df)

You can also specify worksheets through indexes:

df = pd.read_excel('', sheet_name=0)  # Read the first worksheet

4. Read multiple worksheets

If you want to read all sheets in Excel files at once, you can usesheet_name=None, it returns a containing multipleDataFramedictionary.

dfs = pd.read_excel('', sheet_name=None)
# traverse all worksheetsfor sheet, data in ():
    print(f"Worksheet name: {sheet}")
    print(data)

5. Read only specific columns

Sometimes, you just need to read some columns in an Excel file. Can be passedusecolsThe parameter specifies the column to be read.

df = pd.read_excel('', usecols=['Name', 'City'])
print(df)

Output:

    Name      City
0   John  New York
1  Alice    London
2    Bob     Tokyo

You can also use the column index to read specific columns, for example:

df = pd.read_excel('', usecols=[0, 2])  # Read the first and third columnsprint(df)

6. Process missing data

When reading Excel data, you may encounter blank cells. You can usepandasProcess these missing data. For example, it can be done byna_valuesThe parameter recognizes a specific value asNaN, or usefillna()Method fills in missing values.

# Identify a specific value as NaNdf = pd.read_excel('', na_values=['N/A', 'NA'])
# Fill in missing data(value={'Age': 0, 'City': 'Unknown'}, inplace=True)
print(df)

7. Convert Excel data to other formats

Sometimes you may need to save the read Excel data in other formats, such as a CSV file or a JSON file.pandasAllows you to achieve this easily.

# Save as CSV filedf.to_csv('', index=False)
# Save as JSON filedf.to_json('', orient='records')

8. Read large files and optimize performance

If the Excel file is very large, it may cause insufficient memory or slow reading.pandasProvides some methods to optimize performance, such as usingchunksizeRead data in chunks.

# Read Excel files in chunks, reading 100 lines each timechunks = pd.read_excel('large_data.xlsx', chunksize=100)
for chunk in chunks:
    print(chunk)

9. Read Excel using openpyxl

openpyxlIt is more suitable for scenarios where the underlying operations of Excel files are required, such as reading and modifying cell styles, formulas, etc. Here is a simple reading example:

from openpyxl import load_workbook
# Load Excel workbookwb = load_workbook('')
# Select a worksheetsheet = wb['Sheet1']
# Read the value of the specified cellfor row in sheet.iter_rows(min_row=1, max_row=sheet.max_row, values_only=True):
    print(row)

10. More Excel read functions

You can usepandasMore options available, such as:

  • skiprows: Skip a specific number of rows
  • nrows: Read the specified number of rows
  • header: Set custom title line
# Skip the first two lines and read 5 lines of datadf = pd.read_excel('', skiprows=2, nrows=5)
print(df)

Summarize

passpandasandopenpyxl, you can easily read Excel files and perform various data processing operations.pandasMore suitable for fast and simple data analysis,openpyxlIt is suitable for scenarios where more in-depth control of Excel files is required.

This is the end of this article about the detailed explanation of Python's method of reading Excel table data. For more related Python's content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!