Preface
In the process of data processing and analysis, Excel files are a common format in our daily work. Through Python, we can implement various automated operations on Excel files and improve work efficiency.
This article will share 20 practical Excel automation scripts to help novice novice master these skills more easily.
1. Excel cell batch fill
import pandas as pd # Batch fills the cells of the specified columndef fill_column(file_path, column_name, value): df = pd.read_excel(file_path) df[column_name] = value # Fill all cells of the specified column with value df.to_excel(file_path, index=False) fill_column('', 'Remark', 'Processed') print("The comment column has been populated successfully!")
explain
This script populates all the Notes columns in it as Processed. For ordinary users, when processing large amounts of data, a certain column is often required to be uniformly marked, which is particularly important.
2. Set row height and column width
from openpyxl import load_workbook # Set the row height and column width of Exceldef set_row_column_size(file_path): wb = load_workbook(file_path) ws = # Set the first row height and the first column width ws.row_dimensions[1].height = 30 # Set the row height ws.column_dimensions['A'].width = 20 # Set column width (file_path) set_row_column_size('') print("The row height and column width are set successfully!")
explain
This script sets the row height of the first row and the column width of the first column for the Excel file. Appropriate adjustment of row height and column width can improve the readability of the table, especially when there are more or more complex content. Using this feature can make the report more beautiful and easy to read.
3. Delete the row according to the conditions
# Delete rows in Excel according to conditionsdef delete_rows_based_on_condition(file_path, column_name, condition): df = pd.read_excel(file_path) df = df[df[column_name] != condition] # Delete rows that meet the criteria df.to_excel(file_path, index=False) delete_rows_based_on_condition('', 'state', 'invalid') print("The row that meets the criteria has been deleted!")
explain
This script removes rows with the value "Invalid" in the Status column from Excel. This operation is very common during data cleaning and helps reduce noise in the data set and improves the accuracy of data analysis.
4. Create a new Excel worksheet
# Create a new worksheet in an existing Excel filedef create_new_sheet(file_path, sheet_name): wb = load_workbook(file_path) wb.create_sheet(title=sheet_name) # Create a new worksheet (file_path) create_new_sheet('', 'New worksheet') print("New worksheet was created successfully!")
explain
This script creates a new worksheet in an existing Excel file. This is very useful for organizing data, separating data from different tasks or projects, keeping the file structure clear.
5. Import CSV files to Excel
# Import CSV files into Excel worksheetsdef import_csv_to_excel(csv_file, excel_file): df = pd.read_csv(csv_file) df.to_excel(excel_file, index=False) import_csv_to_excel('', 'imported_data.xlsx') print("The CSV file was successfully imported into Excel!")
explain
This script imports CSV files into Excel. Many times, data is provided in CSV format, and the script can be easily converted to Excel format for subsequent analysis and processing.
6. Pivot table generation
# Generate pivot table and save to a new Excel filedef generate_pivot_table(file_path, index_column, values_column, output_file): df = pd.read_excel(file_path) pivot_table = df.pivot_table(index=index_column, values=values_column, aggfunc='sum') # Summary pivot_table.to_excel(output_file) generate_pivot_table('sales_data.xlsx', 'area', 'Sales', 'pivot_output.xlsx') print("Pivot table generation successfully!")
explain
The script generates a summary pivot table based on the given Region and Sales columns and saves it to a new file. When conducting business analysis, the pivot table can quickly display data summary in different dimensions.
7. Format Excel
from import Font, Color # Set Excel cell font styledef format_cells(file_path): wb = load_workbook(file_path) ws = for cell in ws['A']: # traverse column A = Font(bold=True, color="FF0000") # Set font bold and red (file_path) format_cells('') print("Cell formatting successfully!")
explain
This script sets the column A font in it to bold and red. This formatting is often used to emphasize specific data, making the report more visually appealing.
8. Analyze and output descriptive statistics
# Output descriptive statistics to Exceldef descriptive_statistics(file_path, output_file): df = pd.read_excel(file_path) stats = () # Calculate descriptive statistics stats.to_excel(output_file) descriptive_statistics('', 'statistics_output.xlsx') print("Descriptive statistics output succeeded!")
explain
This script calculates descriptive statistics (such as mean, standard deviation, etc.) of Excel file and saves the results to a new Excel file. This is very important for understanding the basic characteristics of data, especially in the early stages of data analysis.
9. Bulkly modify the Excel file name
import os # Batch rename Excel files in the specified directorydef rename_excel_files(directory, prefix): for filename in (directory): if ('.xlsx'): new_name = f"{prefix}_{filename}" ((directory, filename), (directory, new_name)) print(f"Already {filename} Rename to {new_name}") rename_excel_files('/path/to/excel/files', '2024')
explain
This script batch renames all Excel files in the specified directory, prefixing each file name. This batch operation is very convenient for users who need to process large numbers of Excel files, such as naming files based on year or project for easy management and archiving.
10. Automatically send emails containing Excel data
import smtplib from import MIMEMultipart from import MIMEApplication from import MIMEText # Automatically send emails with Excel attachmentsdef send_email(to_address, subject, body, excel_file): from_address = "your_email@" password = "your_password" msg = MIMEMultipart() msg['From'] = from_address msg['To'] = to_address msg['Subject'] = subject # Add text (MIMEText(body, 'plain')) # Add Excel attachments with open(excel_file, "rb") as attachment: part = MIMEApplication((), Name=(excel_file)) part['Content-Disposition'] = f'attachment; filename="{(excel_file)}"' (part) # Send email with ('', 587) as server: () (from_address, password) server.send_message(msg) send_email('recipient@', 'Monthly Report', 'Please find attached the monthly report.', '') print("The email was sent successfully!")
explain
This script uses the SMTP protocol to automatically send an email with an Excel file attached. This feature is especially useful in work, such as sending financial statements or performance reports to relevant personnel regularly every month. Automated mailing can save time and reduce human errors.
11. Merge multiple Excel files
import pandas as pd import os def merge_excel_files(folder_path, output_file): all_data = () for filename in (folder_path): if ('.xlsx'): file_path = (folder_path, filename) df = pd.read_excel(file_path) all_data = ([all_data, df], ignore_index=True) all_data.to_excel(output_file, index=False) merge_excel_files('your_folder_path', 'merged_file.xlsx') print("Multiple Excel files merge successfully!")
explain
This script merges all Excel files in the specified folder into one file. When processing data scattered in multiple files, this function can integrate the data together to facilitate subsequent unified analysis.
12. Split Excel file
import pandas as pd def split_excel_file(file_path, column_name, output_folder): df = pd.read_excel(file_path) unique_values = df[column_name].unique() for value in unique_values: sub_df = df[df[column_name] == value] output_file = (output_folder, f'{value}.xlsx') sub_df.to_excel(output_file, index=False) split_excel_file('', 'department', 'output_folder') print("Excel file split successfully!")
explain
This script splits Excel files into multiple files based on the unique value of the specified column. For example, split the data into files corresponding to different departments according to the "Department" column, so that each department can independently view and process its own data.
13. Replace the cell content
import pandas as pd def replace_cell_content(file_path, column_name, old_value, new_value): df = pd.read_excel(file_path) df[column_name] = df[column_name].replace(old_value, new_value) df.to_excel(file_path, index=False) replace_cell_content('', 'Product Name', 'Old products', 'New Products') print("Cell content replacement was successful!")
explain
This script replaces the specific content in the specified column with the new content. This feature can quickly modify errors or outdated information in the data when it is corrected or updated.
14. Sort the data
import pandas as pd def sort_excel_data(file_path, column_name, ascending=True): df = pd.read_excel(file_path) df = df.sort_values(by=column_name, ascending=ascending) df.to_excel(file_path, index=False) sort_excel_data('', 'Sales', ascending=False) print("The data sorted successfully!")
explain
The main function of this script is to sort the data in the Excel file according to the specified columns, and you can choose ascending or descending order, and finally save the sorted data back to the original Excel file. Sorting operations are very common in data processing and analysis. For example, sorting sales data in descending order by sales, which can quickly find records with high sales.
15. Statistics the number of unique values in a specific column
import pandas as pd def count_unique_values(file_path, column_name): df = pd.read_excel(file_path) unique_count = df[column_name].nunique() print(f"{column_name}The number of unique values of the column is: {unique_count}") count_unique_values('', 'Customer number')
explain
This script is used to count the number of unique values for the specified column in the Excel file. In data analysis, understanding how many different values there are in a column can help us quickly grasp the distribution of data. For example, counting the number of unique values of customer numbers can help us know how many different customers there are.
16. Extract the specified column to the new Excel file
import pandas as pd def extract_columns(file_path, columns, output_file): df = pd.read_excel(file_path) new_df = df[columns] new_df.to_excel(output_file, index=False) extract_columns('', ['Name', 'age'], 'extracted_columns.xlsx') print("The specified column is extracted successfully!")
explain
This script can extract the specified column from an Excel file and save it to a new Excel file. When we only need some of the information in the data, using this script can quickly filter out the required data and avoid processing a large amount of irrelevant information.
17. Add borders to Excel tables
from openpyxl import load_workbook from import Border, Side def add_border_to_excel(file_path): wb = load_workbook(file_path) ws = thin_border = Border(left=Side(style='thin'), right=Side(style='thin'), top=Side(style='thin'), bottom=Side(style='thin')) for row in ws.iter_rows(): for cell in row: = thin_border (file_path) add_border_to_excel('') print("The table border was added successfully!")
explain
This script adds thin borders to each cell in an Excel table. Adding borders can make the table clearer and easier to read, especially when printing or displaying data, which can improve the aesthetics and professionalism of the table.
18. Check whether there are empty lines in Excel file and delete
import pandas as pd def remove_empty_rows(file_path): df = pd.read_excel(file_path) df = (how='all') df.to_excel(file_path, index=False) remove_empty_rows('') print("Blank line deleted successfully!")
explain
This script is used to check if there are rows in the Excel file where all columns are empty and delete these blank rows. Blank rows may affect the results of data processing and analysis. Deleting blank rows can ensure the integrity and accuracy of the data.
19. Filter data based on multiple columns of conditions
import pandas as pd def filter_data_by_multiple_conditions(file_path, conditions, output_file): df = pd.read_excel(file_path) query_str = ' & '.join([f'{col} {op} {val}' for col, op, val in conditions]) filtered_df = (query_str) filtered_df.to_excel(output_file, index=False) # Example Conditions: Age older than 25 and gender is femaleconditions = [('age', '>', 25), ('gender', '==', "'female'")] filter_data_by_multiple_conditions('', conditions, 'filtered_data.xlsx') print("Multiple-condition filtering data was successful!")
explain
This script can filter Excel data based on the conditions of multiple columns and save the filter results to a new file. In actual data analysis, we often need to filter out data that meets the requirements based on multiple conditions. Using this script can facilitate multi-condition filtering.
20. Format the date column in Excel
import pandas as pd def format_date_column(file_path, column_name, date_format): df = pd.read_excel(file_path) df[column_name] = pd.to_datetime(df[column_name]).(date_format) df.to_excel(file_path, index=False) format_date_column('', 'date', '%Y-%m-%d') print("Date column formatting successfully!")
explain
This script is used to format the date column specified in the Excel file. When processing date data, different business requirements may require different date formats. Through this script, the date column can be converted into the format we need, which facilitates subsequent data analysis and display.
Summarize
This is the end of this article about 20 practical Python Excel automation scripts. For more related Python Excel automation scripts, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!