Preface
Pandas
It is one of the core libraries of Python data analysis. It provides rich data processing functions, especially when processing table data (such as.xlsx
Very powerful when file).Pandas
Combining Python's flexibility and simplicity, users can easily read, write, clean, operate and analyze data. This article will introduce how to use itPandas
deal with.xlsx
Common operations of files include reading, writing, filtering, merging and statistics.
1. Environment configuration
1. Install Pandas
First, make sure it is installedPandas
andopenpyxl
(For reading.xlsx
document). You can install it through the following command:
pip install pandas openpyxl
openpyxl
Yes Pandas read by default.xlsx
The file's dependency library ensures that it has been installed correctly.
2. Import Pandas
Before starting to process files, you need to import them in the code.Pandas
:
import pandas as pd
2. Read Excel files
Pandas providespd.read_excel()
Functions can be read easily.xlsx
document.
1. Read a single worksheet
The most common operation is reading.xlsx
Single worksheet in the file. Here are the basic usages for reading Excel files:
# Read the first worksheet in an Excel filedf = pd.read_excel('') # Show the first five elements dataprint(())
Can be passedsheet_name
Parameters specify the worksheet to be read:
# Read a worksheet named "Sheet2"df = pd.read_excel('', sheet_name='Sheet2')
2. Read multiple worksheets
If there are multiple worksheets in an Excel file and you want to read multiple tables at the same time, you can passsheet_name
For list:
# Read multiple worksheets and return a dictionarysheets = pd.read_excel('', sheet_name=['Sheet1', 'Sheet2']) # Get data from a worksheetsheet1_df = sheets['Sheet1']
3. Read all worksheets
To read all worksheets, you cansheet_name=None
:
# Read all worksheetssheets = pd.read_excel('', sheet_name=None) # Get dictionary for all worksheetsfor sheet_name, data in (): print(f"Sheet name: {sheet_name}") print(())
4. Read some columns or rows
Can be usedusecols
Parameters only read specific columns, or usenrows
Read some lines:
# Read data from columns 1 to 3df = pd.read_excel('', usecols="A:C") # Only read the first 10 lines of datadf = pd.read_excel('', nrows=10)
5. Skip the line
Can be usedskiprows
Parameters skip the first few lines in the file:
# Skip the first 5 lines in the filedf = pd.read_excel('', skiprows=5)
3. Write to Excel files
Pandas
Allow toDataFrame
Write data to Excel files, useto_excel()
method.
1. Write DataFrame to Excel
Write DataFrame to.xlsx
document:
df.to_excel('', index=False)
in,index=False
Indicates that the row index is not written. If you need to preserve index information, you can omit it or set it toTrue
。
2. Write multiple worksheets
If you want to write data to multiple worksheets, you can usePerform:
with ('multi_sheet_output.xlsx') as writer: df1.to_excel(writer, sheet_name='Sheet1', index=False) df2.to_excel(writer, sheet_name='Sheet2', index=False)
3. Customize the header
Can be passedheader
Parameters to customize the header name or disable the header:
# Customize the headerdf.to_excel('', header=['Col1', 'Col2', 'Col3'], index=False) # Do not write to the table headerdf.to_excel('', header=False, index=False)
4. Data operation
After reading Excel files, you can use Pandas' powerful data operation capabilities to process data.
1. Filter data
Assume that the excel data table read is as follows:
data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David'], 'Age': [24, 27, 22, 32], 'Score': [85, 62, 90, 88] } df = (data)
Data can be filtered based on specific conditions:
# Filter out data older than 25 years oldfiltered_df = df[df['Age'] > 25] print(filtered_df)
2. Sort data
You can sort the data according to the value of a column:
# Sort ascending order by agesorted_df = df.sort_values(by='Age', ascending=True) print(sorted_df)
3. Grouping and Aggregation
The data can be grouped according to a certain column and the aggregate result can be calculated:
# Group by age and calculate the average scoregrouped = ('Age')['Score'].mean() print(grouped)
4. Missing value processing
Pandas provides a variety of ways to deal with missing values. For example, find and delete missing values:
# View missing valuesprint(().sum()) # Delete rows containing missing values(inplace=True) # Replace missing values with a value(0, inplace=True)
5. Advanced operations of Excel files
1. Merge multiple Excel files
Assume there are multiple Excel files and they have the same column structure, you can useconcat()
Methods to merge these files:
import pandas as pd # Read multiple Excel filesdf1 = pd.read_excel('') df2 = pd.read_excel('') # Merge datadf_combined = ([df1, df2], ignore_index=True) print(df_combined)
2. Use custom data types
Can be passeddtype
Parameters specify the data type of the read column:
# Read the 'Age' column as a stringdf = pd.read_excel('', dtype={'Age': str})
3. Process merged cells
In an Excel file, merging cells can cause incomplete data reading. Pandas will assign the first value of the merged cell to all cells in that column by default. If you want to preserve the data structure, you can manually process these merged cells:
df = pd.read_excel('data_with_merged_cells.xlsx', merge_cells=False)
4. Conditional formatting
You can add conditional formatting when writing to an Excel file. For example, a cell that highlights certain conditions:
import pandas as pd from import Styler # Create style functionsdef highlight_max(s): is_max = s == () return ['background-color: yellow' if v else '' for v in is_max] # Create DataFramedf = ({ 'A': [1, 2, 3], 'B': [4, 3, 6], 'C': [7, 8, 5] }) # Apply styles and save them to Excelstyled = (highlight_max) styled.to_excel('styled_output.xlsx', engine='openpyxl', index=False)
6. Summary
This article introduces how to use itPandas
deal with.xlsx
Files, including read, write, data operations and some advanced operations. Pandas provides powerful capabilities for processing Excel files, especially in data cleaning, analysis, and preservation, which can help to easily deal with complex Excel data operations.
Common operations include:
- use
read_excel()
Read the contents of an Excel file and read specific worksheets or part of data according to your needs. - use
to_excel()
Write DataFrame data to an Excel file, and you can output multiple worksheets or custom formats. - With Pandas' powerful data operation capabilities, data filtering, sorting, grouping, aggregation and processing missing values can be performed.
By mastering these operations, you will be able to process and analyze data in Excel files more efficiently.
This is the article about Python processing .xlsx files using Pandas. For more information about Pandas processing .xlsx files, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!