1. Read multiple Excel files and merge
Suppose you have a folder that contains multiple Excel files, and you want to merge these files into a DataFrame.
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Create an empty DataFrame to store all dataall_data = () # Read each Excel file one by one and append the data to all_datafor file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path) all_data = ([all_data, df], ignore_index=True) # View the merged dataprint(all_data.head())
2. Batch processing of multiple Excel files
Suppose you need to do the same processing on multiple Excel files (for example, adding a column, filtering data, etc.).
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' output_folder = 'path/to/output/folder' # Make sure the output folder exists(output_folder, exist_ok=True) # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Process each Excel filefor file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path) # Add a column df['New_Column'] = 'Some Value' # Filter data filtered_df = df[df['Some_Column'] > 100] # Save processed data output_file_path = (output_folder, file) filtered_df.to_excel(output_file_path, index=False) print("Processing complete.")
3. Extract specific information from multiple Excel files
Suppose you need to extract specific information from multiple Excel files (for example, data from a specific cell).
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] #Storing resultsresults = [] # Extract specific information from each Excel filefor file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path) # Suppose we need to extract the data from the first row and the first column specific_value = [0, 0] # Store the results in a list ((file, specific_value)) # Print the resultsfor file, value in results: print(f"File: {file}, Specific Value: {value}")
4. Use openpyxl to process multiple Excel files
If you need to control Excel files more granularly (for example, modifying specific cells, formatting, etc.), you can use the openpyxl library.
import openpyxl import os # folder pathfolder_path = 'path/to/your/excel/files' output_folder = 'path/to/output/folder' # Make sure the output folder exists(output_folder, exist_ok=True) # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Process each Excel filefor file in excel_files: file_path = (folder_path, file) workbook = openpyxl.load_workbook(file_path) sheet = # Modify specific cells sheet['A1'] = 'New Value' # Save the processed file output_file_path = (output_folder, file) (output_file_path) print("Processing complete.")
5. Merge multiple Excel files into different worksheets in one workbook
Suppose you have multiple Excel files and want to merge them into different worksheets in a new Excel workbook.
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' output_file = 'merged_workbook.xlsx' # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Create a new ExcelWriter objectwith (output_file, engine='openpyxl') as writer: # Process each Excel file and write data to a different worksheet for file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path) # Use file name as worksheet name sheet_name = (file)[0] # Write data df.to_excel(writer, sheet_name=sheet_name, index=False) print("Merging complete.")
6. Batch processing of multiple Excel files and perform data cleaning
Suppose you need to clean multiple Excel files data, such as deleting empty lines, filling missing values, etc.
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' output_folder = 'path/to/output/folder' # Make sure the output folder exists(output_folder, exist_ok=True) # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Process each Excel filefor file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path) # Delete empty lines (how='all', inplace=True) # Fill in missing values (0, inplace=True) # Save processed data output_file_path = (output_folder, file) df.to_excel(output_file_path, index=False) print("Data cleaning complete.")
7. Extract specific columns from multiple Excel files and merge them
Suppose you need to extract specific columns from multiple Excel files and merge them into a new DataFrame.
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Create an empty DataFrame to store all dataall_data = () # Read each Excel file one by one and extract specific columnsfor file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path, usecols=['Column1', 'Column2']) # Append the extracted data to all_data all_data = ([all_data, df], ignore_index=True) # View the merged dataprint(all_data.head())
8. Batch rename worksheets in multiple Excel files
Suppose you need to batch rename the worksheet names in multiple Excel files.
import openpyxl import os # folder pathfolder_path = 'path/to/your/excel/files' output_folder = 'path/to/output/folder' # Make sure the output folder exists(output_folder, exist_ok=True) # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Process each Excel filefor file in excel_files: file_path = (folder_path, file) workbook = openpyxl.load_workbook(file_path) # Rename the worksheet if 'OldSheetName' in : sheet = workbook['OldSheetName'] = 'NewSheetName' # Save the processed file output_file_path = (output_folder, file) (output_file_path) print("Sheet renaming complete.")
9. Batch export of Excel data to CSV files
Suppose you need to batch export data from multiple Excel files into CSV files.
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' output_folder = 'path/to/output/csvs' # Make sure the output folder exists(output_folder, exist_ok=True) # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Process each Excel filefor file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path) # Generate output file path base_name = (file)[0] output_file_path = (output_folder, f'{base_name}.csv') # Export as CSV file df.to_csv(output_file_path, index=False) print("Export to CSV complete.")
10. Batch processing of multiple Excel files and perform data analysis
Suppose you need to perform data analysis on multiple Excel files, such as calculating sums, averages, etc.
import pandas as pd import os # folder pathfolder_path = 'path/to/your/excel/files' # Get all Excel files in the folderexcel_files = [f for f in (folder_path) if ('.xlsx') or ('.xls')] # Create an empty DataFrame to store all dataall_data = () # Read each Excel file one by one and append the data to all_datafor file in excel_files: file_path = (folder_path, file) df = pd.read_excel(file_path) # Append data to all_data all_data = ([all_data, df], ignore_index=True) # Conduct data analysistotal_sum = all_data['Some_Column'].sum() average_value = all_data['Some_Column'].mean() # Print the resultsprint(f"Total Sum: {total_sum}") print(f"Average Value: {average_value}")
This is the end of this article about Pandas+openpyxl for Excel processing. For more information about Pandas openpyxl for Excel processing, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!