SoFunction
Updated on 2025-03-02

Python uses pandas to automatically merge Excel files

Preface

In data analysis and processing, you often encounter situations where multiple Excel files need to be merged. This article introduces a method to automatically merge Excel files using the Pandas library and Glob module in the Python programming language. By writing concise scripts, we can efficiently search, read, merge and save a large number of Excel files, greatly improving the efficiency of data processing.

Keywords: Python, Pandas, Glob, Excel file merging

text

1. Introduction

When working with large-scale datasets, data is often scattered across multiple Excel files. Manually merging these files is not only time consuming, but also error-prone. Automating this process can save a lot of time and reduce human error. This article will show how to achieve this using the Pandas library and Glob modules in Python.

2. Method

  • Import the necessary libraries:
import pandas as pd
import glob
  • Initialize a list to store the found Excel file path:
file_paths = []
  • Use the Glob module to search for all Excel files in the specified directory and store the path to the list:
file_paths = (r'./test/*.xlsx')
  • Print the found file list to confirm that the file has been correctly identified.

  • Read the first Excel file in the list and initialize a DataFrame to store the merged data:

first_file = file_paths[0]
initial_data = pd.read_excel(first_file)
  • Print the index of the final DataFrame to verify that the data is merged correctly.

  • Using Pandas' ExcelWriter feature, write the merged data into a new Excel file:

with ('') as writer:
    initial_data.to_excel(writer, sheet_name='Sheet1', index=False)

3. Summary

After executing the above script, all Excel files will be merged into a new file named "". The file will contain data from all the original files and do not contain the original index.

The method presented in this article provides a fast and automated way to merge Excel files, especially for situations where large amounts of data are required. Using Python's Pandas library and Glob module, we can easily extend this script to suit different file paths and file types.

The process of automating the merge of Excel files not only improves the efficiency of data processing, but also reduces the possibility of human error. This method can be widely used in all stages of data cleaning, preprocessing and analysis.

Yes, there are similar code snippets. Here is a sample code that uses Python's pandas library and glob module to merge multiple Excel files:

import pandas as pd
import glob

# Initialize the list used to store file pathsfile_paths = (r'./test/*.xlsx')

# Print the found file listfor file_path in file_paths:
    print(f"Found file: {file_path}")

# Read the first Excel file and initialize the result DataFramefirst_file = file_paths[0]
initial_data = pd.read_excel(first_file)

# Merge remaining Excel files into the result DataFramefor file_path in file_paths[1:]:
    additional_data = pd.read_excel(file_path)
    # Use the concat function to merge data, ignore indexes and keep data order    initial_data = ([initial_data, additional_data], ignore_index=True)

# Print the index of the final DataFrame to verify that the data is merged correctlyprint("Final DataFrame index:", initial_data.index)

# Use ExcelWriter to write the merged data to a new Excel filewith ('') as writer:
    initial_data.to_excel(writer, sheet_name='Sheet1', index=False)

print("Data has been successfully merged and saved to ''.")

This code first uses the glob module to search all the .xlsx format Excel files in the current directory and store their paths in a list. It then reads the first file in the list and initializes its data into a DataFrame. Next, the code traverses the remaining files, adds their data to the original DataFrame, and merges the data using the concat function while ignoring the index to keep the data order. Finally, use ExcelWriter to write the merged data into a new Excel file.

How to merge multiple Excel files using pandas' merge function instead of concat function?

The difference between the merge function of Pandas and the concat function

Pandas'smergeFunctions andconcatFunctions are tools for merging data, but their working principle and applicable scenarios are different.

concat function

The concat function is mainly used to connect multiple DataFrame objects together along a specific axis (row or column). It does not merge based on column names or row indexes, but simply stacks one DataFrame on top or next to another DataFrame. The concat function is suitable for stacking DataFrames with the same structure vertically or horizontally, but it does not merge based on the values ​​of the column.

merge function

How to merge multiple Excel files using merge function

To usemergeFunctions merge multiple Excel files, you need to use firstpandas.read_excelThe function reads each Excel file into a DataFrame and then usesmergeFunctions are merged according to shared keys. Here is a simple example:

import pandas as pd

# Read two Excel filesdf1 = pd.read_excel('./test/')
df2 = pd.read_excel('./test/')

# Merge DataFramemerged_df = (df1, df2, on=['id', 'age', 'sex', 'region', 'income', 'married'], how='outer')

# Save the merged data to Excel filemerged_df.to_excel('merged_file.xlsx', index=False)

In the above code, on=['',''] specifies the key used for merge, and how='inner' specifies the type of merge (in the case of internal joins, only rows matching the shared key will be merged). Finally, use the to_excel function to save the merged DataFrame to a new Excel file.

Note that when using the merge function, make sure that the merged keys are present in all DataFrames to be merged and that their data types are compatible. If the keys have different data types in different DataFrames, type conversion may be required before merging. Additionally, if there are duplicate key values ​​in the merged DataFrame, these duplicate values ​​may need to be processed to avoid affecting the merge result.

The above is the detailed content of Python's method of using pandas to automatically merge Excel files. For more information about python pandas merging Excel, please follow my other related articles!