In daily data processing work, we often encounter an Excel file containing multiple worksheets (Sheets), and each worksheet may need to be saved as an independent Excel file. This operation is very tedious and error-prone, but with the pandas library in Python, we can automate this process very easily.
In this blog post, we will describe how to use Python to save each worksheet (Sheet) in an Excel file into a separate Excel file.
1. Use the pandas library to do it
In addition to openpyxl, the pandas library can also be used to process Excel files. pandas provides a very easy way to read and write Excel files, especially suitable for scenarios where large amounts of data are required.
Install pandas and openpyxl
pandas needs to rely on openpyxl to handle files in the .xlsx format. If not installed, you can install it together with the following command:
pip install pandas openpyxl
Code implementation
The code to use pandas to save each worksheet in an Excel file as a separate file is as follows:
import pandas as pd def split_excel_sheet(input_file): # Read all worksheets in Excel files xls = (input_file) # traverse each worksheet for sheet_name in xls.sheet_names: # Read data for each worksheet df = pd.read_excel(xls, sheet_name) # Save each worksheet as a separate Excel file new_file = f"{sheet_name}.xlsx" df.to_excel(new_file, index=False) print(f"Worksheet '{sheet_name}' Saved as {new_file}") #User Exampleinput_file = '/path/to/' # Excel files that need to be splitsplit_excel_sheet(input_file)
Code parsing
Read Excel file: Read the entire Excel file via (input_file).
Traversal sheets: Get all sheet names in the file via xls.sheet_names and iterate through each sheet.
Read worksheet data: Use pd.read_excel() to read the data for each worksheet and convert it to DataFrame format.
Save as a separate Excel file: Save the data of each worksheet as a separate Excel file through df.to_excel(), and the index=False parameter is used to not save the row index.
Output example
Similar to the openpyxl implementation, after running the above code, the program will output messages that each worksheet is saved as a separate Excel file. For example:
Worksheet 'Sheet1' has been saved as
Worksheet 'Sheet2' has been saved as
2. Summary
With Python's openpyxl and pandas libraries, we can easily save each worksheet in an Excel file as a separate Excel file. pandas: suitable for data analysis and processing, easy to operate, especially when processing Excel files with large amounts of data.
3. Knowledge expansion
Automated office merge multiple excels
We will use the pandas and openpyxl libraries to accomplish this. pandas is suitable for reading and processing of data, while openpyxl is suitable for manipulating Excel files.
Install the required libraries
First, make sure you have the following Python libraries installed:
pip install pandas openpyxl
Sample code
Suppose you have multiple Excel files, the file structure is as follows:
There is a worksheet in each file that contains data with the same structure (the column names are the same).
1. Import the library
import pandas as pd import os
2. Read multiple Excel files and merge
We use the os module to iterate through all Excel files in the specified directory and read data through pandas. Merge the data from each file into a large DataFrame.
def merge_excel_files(input_folder, output_file): # Get all Excel files in the folder all_files = [f for f in (input_folder) if ('.xlsx')] # Initialize an empty DataFrame to store merged data combined_df = () # traverse all files, read and merge one by one for file in all_files: file_path = (input_folder, file) print(f"Processing files: {file_path}") # Read Excel files df = pd.read_excel(file_path) # Merge data combined_df = ([combined_df, df], ignore_index=True) # Save the merged data to a new Excel file combined_df.to_excel(output_file, index=False) print(f"Merge is completed,The result has been saved to: {output_file}")
3. Call the function and run it
Call the merge_excel_files function above and pass in the folder path and the output file path:
# Specify the input folder path and the output file pathinput_folder = 'path_to_your_excel_files' # Replace with your folder pathoutput_file = 'merged_output.xlsx' # Output file path # Call merge functionmerge_excel_files(input_folder, output_file)
Code description
Get the file list: Get all .xlsx files in the specified directory by getting it.
Read and merge data: Use pandas.read_excel to read the data of each Excel file and use the method to merge the data into a large DataFrame. ignore_index=True Ensure that the merged data will not be indexed repeatedly.
Save the merge result: Finally, save the merged data to a new Excel file, using the to_excel method.
Execution results
After executing the above code, you will see the following output:
Processing file: path_to_your_excel_files/
Processing file: path_to_your_excel_files/
Processing file: path_to_your_excel_files/
The merge is completed, and the result has been saved to: merged_output.xlsx
The merged data will be saved tomerged_output.xlsx
in the file.
This is the article about Python splitting Sheet pages into separate Excel files. For more related Python Sheet pages into separate Excel content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!