SoFunction
Updated on 2025-04-17

Detailed explanation of Pandas+openpyxl for Excel processing

1. Read multiple Excel files and merge

Suppose you have a folder that contains multiple Excel files, and you want to merge these files into a DataFrame.

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Create an empty DataFrame to store all dataall_data = ()
# Read each Excel file one by one and append the data to all_datafor file in excel_files:
    file_path = (folder_path, file)
    df = pd.read_excel(file_path)
    all_data = ([all_data, df], ignore_index=True)
# View the merged dataprint(all_data.head())

2. Batch processing of multiple Excel files

Suppose you need to do the same processing on multiple Excel files (for example, adding a column, filtering data, etc.).

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
output_folder = 'path/to/output/folder'
# Make sure the output folder exists(output_folder, exist_ok=True)
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Process each Excel filefor file in excel_files:
    file_path = (folder_path, file)
    df = pd.read_excel(file_path)
    # Add a column    df['New_Column'] = 'Some Value'
    # Filter data    filtered_df = df[df['Some_Column'] > 100]
    # Save processed data    output_file_path = (output_folder, file)
    filtered_df.to_excel(output_file_path, index=False)
print("Processing complete.")

3. Extract specific information from multiple Excel files

Suppose you need to extract specific information from multiple Excel files (for example, data from a specific cell).

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
#Storing resultsresults = []
# Extract specific information from each Excel filefor file in excel_files:
    file_path = (folder_path, file)
    df = pd.read_excel(file_path)
    # Suppose we need to extract the data from the first row and the first column    specific_value = [0, 0]
    # Store the results in a list    ((file, specific_value))
# Print the resultsfor file, value in results:
    print(f"File: {file}, Specific Value: {value}")

4. Use openpyxl to process multiple Excel files

If you need to control Excel files more granularly (for example, modifying specific cells, formatting, etc.), you can use the openpyxl library.

import openpyxl
import os
# folder pathfolder_path = 'path/to/your/excel/files'
output_folder = 'path/to/output/folder'
# Make sure the output folder exists(output_folder, exist_ok=True)
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Process each Excel filefor file in excel_files:
    file_path = (folder_path, file)
    workbook = openpyxl.load_workbook(file_path)
    sheet = 
    # Modify specific cells    sheet['A1'] = 'New Value'
    # Save the processed file    output_file_path = (output_folder, file)
    (output_file_path)
print("Processing complete.")

5. Merge multiple Excel files into different worksheets in one workbook

Suppose you have multiple Excel files and want to merge them into different worksheets in a new Excel workbook.

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
output_file = 'merged_workbook.xlsx'
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Create a new ExcelWriter objectwith (output_file, engine='openpyxl') as writer:
    # Process each Excel file and write data to a different worksheet    for file in excel_files:
        file_path = (folder_path, file)
        df = pd.read_excel(file_path)
        # Use file name as worksheet name        sheet_name = (file)[0]
        # Write data        df.to_excel(writer, sheet_name=sheet_name, index=False)
print("Merging complete.")

6. Batch processing of multiple Excel files and perform data cleaning

Suppose you need to clean multiple Excel files data, such as deleting empty lines, filling missing values, etc.

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
output_folder = 'path/to/output/folder'
# Make sure the output folder exists(output_folder, exist_ok=True)
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Process each Excel filefor file in excel_files:
    file_path = (folder_path, file)
    df = pd.read_excel(file_path)
    # Delete empty lines    (how='all', inplace=True)
    # Fill in missing values    (0, inplace=True)
    # Save processed data    output_file_path = (output_folder, file)
    df.to_excel(output_file_path, index=False)
print("Data cleaning complete.")

7. Extract specific columns from multiple Excel files and merge them

Suppose you need to extract specific columns from multiple Excel files and merge them into a new DataFrame.

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Create an empty DataFrame to store all dataall_data = ()
# Read each Excel file one by one and extract specific columnsfor file in excel_files:
    file_path = (folder_path, file)
    df = pd.read_excel(file_path, usecols=['Column1', 'Column2'])
    # Append the extracted data to all_data    all_data = ([all_data, df], ignore_index=True)
# View the merged dataprint(all_data.head())

8. Batch rename worksheets in multiple Excel files

Suppose you need to batch rename the worksheet names in multiple Excel files.

import openpyxl
import os
# folder pathfolder_path = 'path/to/your/excel/files'
output_folder = 'path/to/output/folder'
# Make sure the output folder exists(output_folder, exist_ok=True)
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Process each Excel filefor file in excel_files:
    file_path = (folder_path, file)
    workbook = openpyxl.load_workbook(file_path)
    # Rename the worksheet    if 'OldSheetName' in :
        sheet = workbook['OldSheetName']
         = 'NewSheetName'
    # Save the processed file    output_file_path = (output_folder, file)
    (output_file_path)
print("Sheet renaming complete.")

9. Batch export of Excel data to CSV files

Suppose you need to batch export data from multiple Excel files into CSV files.

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
output_folder = 'path/to/output/csvs'
# Make sure the output folder exists(output_folder, exist_ok=True)
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Process each Excel filefor file in excel_files:
    file_path = (folder_path, file)
    df = pd.read_excel(file_path)
    # Generate output file path    base_name = (file)[0]
    output_file_path = (output_folder, f'{base_name}.csv')
    # Export as CSV file    df.to_csv(output_file_path, index=False)
print("Export to CSV complete.")

10. Batch processing of multiple Excel files and perform data analysis

Suppose you need to perform data analysis on multiple Excel files, such as calculating sums, averages, etc.

import pandas as pd
import os
# folder pathfolder_path = 'path/to/your/excel/files'
# Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')]
# Create an empty DataFrame to store all dataall_data = ()
# Read each Excel file one by one and append the data to all_datafor file in excel_files:
    file_path = (folder_path, file)
    df = pd.read_excel(file_path)
    # Append data to all_data    all_data = ([all_data, df], ignore_index=True)
# Conduct data analysistotal_sum = all_data['Some_Column'].sum()
average_value = all_data['Some_Column'].mean()
# Print the resultsprint(f"Total Sum: {total_sum}")
print(f"Average Value: {average_value}")

This is the end of this article about Pandas+openpyxl for Excel processing. For more information about Pandas openpyxl for Excel processing, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!