Python implementation splits Sheet pages into separate Excel files

In daily data processing work, we often encounter an Excel file containing multiple worksheets (Sheets), and each worksheet may need to be saved as an independent Excel file. This operation is very tedious and error-prone, but with the pandas library in Python, we can automate this process very easily.

In this blog post, we will describe how to use Python to save each worksheet (Sheet) in an Excel file into a separate Excel file.

1. Use the pandas library to do it

In addition to openpyxl, the pandas library can also be used to process Excel files. pandas provides a very easy way to read and write Excel files, especially suitable for scenarios where large amounts of data are required.

Install pandas and openpyxl

pandas needs to rely on openpyxl to handle files in the .xlsx format. If not installed, you can install it together with the following command:

pip install pandas openpyxl

Code implementation

The code to use pandas to save each worksheet in an Excel file as a separate file is as follows:

import pandas as pd
 
def split_excel_sheet(input_file):
    # Read all worksheets in Excel files    xls = (input_file)
 
    # traverse each worksheet    for sheet_name in xls.sheet_names:
        # Read data for each worksheet        df = pd.read_excel(xls, sheet_name)
 
        # Save each worksheet as a separate Excel file        new_file = f"{sheet_name}.xlsx"
        df.to_excel(new_file, index=False)
        print(f"Worksheet '{sheet_name}' Saved as {new_file}")
 
#User Exampleinput_file = '/path/to/'  # Excel files that need to be splitsplit_excel_sheet(input_file)

Code parsing

Read Excel file: Read the entire Excel file via (input_file).

Traversal sheets: Get all sheet names in the file via xls.sheet_names and iterate through each sheet.

Read worksheet data: Use pd.read_excel() to read the data for each worksheet and convert it to DataFrame format.

Save as a separate Excel file: Save the data of each worksheet as a separate Excel file through df.to_excel(), and the index=False parameter is used to not save the row index.

Output example

Similar to the openpyxl implementation, after running the above code, the program will output messages that each worksheet is saved as a separate Excel file. For example:

Worksheet 'Sheet1' has been saved as

Worksheet 'Sheet2' has been saved as

2. Summary

With Python's openpyxl and pandas libraries, we can easily save each worksheet in an Excel file as a separate Excel file. pandas: suitable for data analysis and processing, easy to operate, especially when processing Excel files with large amounts of data.

3. Knowledge expansion

Automated office merge multiple excels

We will use the pandas and openpyxl libraries to accomplish this. pandas is suitable for reading and processing of data, while openpyxl is suitable for manipulating Excel files.

Install the required libraries

First, make sure you have the following Python libraries installed:

pip install pandas openpyxl

Sample code

Suppose you have multiple Excel files, the file structure is as follows:

There is a worksheet in each file that contains data with the same structure (the column names are the same).

1. Import the library

import pandas as pd 
import os

2. Read multiple Excel files and merge

We use the os module to iterate through all Excel files in the specified directory and read data through pandas. Merge the data from each file into a large DataFrame.

def merge_excel_files(input_folder, output_file):
    # Get all Excel files in the folder    all_files = [f for f in (input_folder) if ('.xlsx')]
    
    # Initialize an empty DataFrame to store merged data    combined_df = ()
    
    # traverse all files, read and merge one by one    for file in all_files:
        file_path = (input_folder, file)
        print(f"Processing files: {file_path}")
        
        # Read Excel files        df = pd.read_excel(file_path)
        
        # Merge data        combined_df = ([combined_df, df], ignore_index=True)
    
    # Save the merged data to a new Excel file    combined_df.to_excel(output_file, index=False)
    print(f"Merge is completed，The result has been saved to: {output_file}")

3. Call the function and run it

Call the merge_excel_files function above and pass in the folder path and the output file path:

# Specify the input folder path and the output file pathinput_folder = 'path_to_your_excel_files'  # Replace with your folder pathoutput_file = 'merged_output.xlsx'         # Output file path 
# Call merge functionmerge_excel_files(input_folder, output_file)

Code description

Get the file list: Get all .xlsx files in the specified directory by getting it.

Read and merge data: Use pandas.read_excel to read the data of each Excel file and use the method to merge the data into a large DataFrame. ignore_index=True Ensure that the merged data will not be indexed repeatedly.

Save the merge result: Finally, save the merged data to a new Excel file, using the to_excel method.

Execution results

After executing the above code, you will see the following output:

Processing file: path_to_your_excel_files/
Processing file: path_to_your_excel_files/
Processing file: path_to_your_excel_files/
The merge is completed, and the result has been saved to: merged_output.xlsx

The merged data will be saved tomerged_output.xlsxin the file.

This is the article about Python splitting Sheet pages into separate Excel files. For more related Python Sheet pages into separate Excel content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!