preamble
Recently teaching assistant to change the results of homework exported form with the teacher to the list of order is not consistent, the brain shell a light on the use of pandas to write a script to automatically bar the original export results transcribed to the teacher to the list of ha ha ha, here on the record used pandas to deal with excel in a common way. (Note: only applies to .xlsx type of file)
1, read xlsx table: pd.read_excel ()
The original content is below:
a) Read the data of the nth Sheet (sub-sheet, you can view or add or delete sub-sheets at the bottom left)
import pandas as pd # Paths that need to be changed every time path = "" # sheet_name defaults to 0, i.e., reads the data from the first sheet sheet = pd.read_excel(path, sheet_name=0) print(sheet) """ Unnamed: 0 name1 name2 name3 0 row1 1 2.0 3 1 row2 4 NaN 6 2 row3 7 8.0 9 """
It can be noted that the top left corner of the original form is not filled in and reads "Unnamed: 0", this is because the read_excel function willBy default, the first line of the table is the column index name. Also, for row index names, the default numbering starts from the second row (since the default first row is the column index name, the default first row is not the data), and if you do not specify it specifically, the numbering automatically starts from 0, as follows.
sheet = pd.read_excel(path) # View column index names, return list form print() # View line index name, the default numbering from the second line, if not specified, the automatic numbering from 0, return list form print() """ ['Unnamed: 0' 'name1' 'name2' 'name3'] [0 1 2] """
b) Column index nameIt can also be customized as follows:
sheet = pd.read_excel(path, names=['col1', 'col2', 'col3', 'col4']) print(sheet) # View column index names, return list form print() """ col1 col2 col3 col4 0 row1 1 2.0 3 1 row2 4 NaN 6 2 row3 7 8.0 9 ['col1' 'col2' 'col3' 'col4'] """
c) You can also specify the nth column as the row index name., as follows:
# Specify the first column as the row index sheet = pd.read_excel(path, index_col=0) print(sheet) """ name1 name2 name3 row1 1 2.0 3 row2 4 NaN 6 row3 7 8.0 9 """
d) Skip the nth line of data when reading
# Skip data in row 2 (first row indexed to 0) sheet = pd.read_excel(path, skiprows=[1]) print(sheet) """ Unnamed: 0 name1 name2 name3 0 row2 4 NaN 6 1 row3 7 8.0 9 """
2, get the data size of the form: shape
path = "" # Specify the first column as the row index sheet = pd.read_excel(path, index_col=0) print(sheet) print('==========================') print('shape of sheet:', ) """ name1 name2 name3 row1 1 2.0 3 row2 4 NaN 6 row3 7 8.0 9 ========================== shape of sheet: (3, 3) """
3. Methods of indexing data: [ ] / loc[] / iloc[]
1. Direct square bracket indexing
You can use square brackets with column names[col_name] to extract the data in a column, and then use square brackets with the index number[index] to index the value of this column at a specific location. Here we index the column named name1 and print the data located in row 1 (indexed by 1) of the column: 4, as follows:
sheet = pd.read_excel(path) # Retrieve the column with the name1 name. col = sheet['name1'] print(col) # Print the second data in the column print(col[1]) # 4 """ 0 1 1 4 2 7 Name: name1, dtype: int64 4 """
2. iloc method, indexed by integer numbering
utilization[ ] index, the square brackets for the rows and columns of the integer position numbering (after removing the row as the index of the rows and as the index of the columns of which rows, starting from 0).
a)[1, 2] : ExtractionRow 2, column 3Data. The first is a row index and the second is a column index
b)[0: 2] : Extractionfirst two linesdigital
c)[0:2, 0:2] : Extracted by slicingfirst two lines (used form a nominal expression)first two columns digital
# Specify the first column of data as a row index sheet = pd.read_excel(path, index_col=0) # Read column 3 (6) of row 2 (row2) # The first is a row index, the second is a column index # data = [1, 2] print(data) # 6 print('================================') # Extract the first two rows of data by slicing and dicing # data_slice = [0:2] print(data_slice) print('================================') # Extract the first two columns of the first two rows by slicing the first two columns # data_slice = [0:2, 0:2] print(data_slice) """ 6 ================================ name1 name2 name3 row1 1 2.0 3 row2 4 NaN 6 ================================ name1 name2 row1 1 2.0 row2 4 NaN """
3. loc method, indexed by row and column name
utilization[ ] Indexed, in square bracketsThe name string of the row or column. It is used in the same way asiloc This is just replacing the integer index of iloc with the name index of the rows and columns. This type of indexing is more intuitive to use.
take note of:iloc[1: 2] is free of 2, butloc['row1': 'row2'] is the one containing 'row2'.
# Specify the first column of data as a row index sheet = pd.read_excel(path, index_col=0) # Read column 3 (6) of row 2 (row2) # The first is a row index, the second is a column index # data = ['row2', 'name3'] print(data) # 1 print('================================') # Extract the first two rows of data by slicing them up # data_slice = ['row1': 'row2'] print(data_slice) print('================================') # Extract the first two columns of the first two rows by slicing the first two columns # data_slice1 = ['row1': 'row2', 'name1': 'name2'] print(data_slice1) """ 6 ================================ name1 name2 name3 row1 1 2.0 3 row2 4 NaN 6 ================================ name1 name2 row1 1 2.0 row2 4 NaN """
4, to determine the data is empty: () / ()
1. UseThe numpy library's isnan() maybe The pandas library's isnull()method to determine if it is equal tonan 。
sheet = pd.read_excel(path) # Retrieve the column with the name1 name. col = sheet['name2'] print((col[1])) # True print((col[1])) # True """ True True """
2、Use str() to convert to a string, determine if it is equal to'nan' 。
sheet = pd.read_excel(path) # Retrieve the column with the name1 name. col = sheet['name2'] print(col) # Print the second data in the column if str(col[1]) == 'nan': print('col[1] is nan') """ 0 2.0 1 NaN 2 8.0 Name: name2, dtype: float64 col[1] is nan """
5. Finding eligible data
Here's the code to get the idea
# Extract lines with name1 == 1 mask = (sheet['name1'] == 1) x = [mask] print(x) """ name1 name2 name3 row1 1 2.0 3 """
6, modify the value of the element: replace ()
sheet['name2'].replace(2, 100, inplace=True) : Change element 2 of column name2 to element 100, in-place.
sheet['name2'].replace(2, 100, inplace=True) print(sheet) """ name1 name2 name3 row1 1 100.0 3 row2 4 NaN 6 row3 7 8.0 9 """
sheet['name2'].replace(, 100, inplace=True) : Change the empty element (nan) of column name2 to element 100, in-place.
import numpy as np sheet['name2'].replace(, 100, inplace=True) print(sheet) print(type(['row2', 'name2'])) """ name1 name2 name3 row1 1 2.0 3 row2 4 100.0 6 row3 7 8.0 9 """
7. Add data: [ ]
To add a column, add it directly using the center bracket [ name to be added ].
sheet['name_add'] = [55, 66, 77] : Add a column named name_add with values [55, 66, 77].
path = "" # Specify the first column as the row index sheet = pd.read_excel(path, index_col=0) print(sheet) print('====================================') # Add a column named name_add with values [55, 66, 77]. sheet['name_add'] = [55, 66, 77] print(sheet) """ name1 name2 name3 row1 1 2.0 3 row2 4 NaN 6 row3 7 8.0 9 ==================================== name1 name2 name3 name_add row1 1 2.0 3 55 row2 4 NaN 6 66 row3 7 8.0 9 77 """
8, delete data: del () / drop ()
a)del(sheet['name3']) : Delete using the del method
sheet = pd.read_excel(path, index_col=0) # Delete the column 'name3' using the del method del(sheet['name3']) print(sheet) """ name1 name2 row1 1 2.0 row2 4 NaN row3 7 8.0 """
b)('row1', axis=0)
Use the drop method to delete row1, with axis=1 if the column is deleted.
When the inplace parameter is True, it will not return the parameter and will remove it directly from the original data.
When the inplace parameter is False (the default), the original data is not modified, but the modified data is returned.
('row1', axis=0, inplace=True) print(sheet) """ name1 name2 name3 row2 4 NaN 6 row3 7 8.0 9 """
c)(labels=['name1', 'name2'], axis=1)
Use the label=[ ] parameter to delete multiple rows or columns.
# Delete multiple columns, the default inplace parameter is False, i.e., the result will be returned. print((labels=['name1', 'name2'], axis=1)) """ name3 row1 3 row2 6 row3 9 """
9, save to excel file: to_excel ()
1. Save the data in pandas format as an .xlsx file.
names = ['a', 'b', 'c'] scores = [99, 100, 99] result_excel = () result_excel["Name"] = names result_excel["Scoring."] = scores # Write to excel result_excel.to_excel('')
2, to change the good excel file save as .xlsx file.
For example, save the file after changing nan to 100 in the original table:
import numpy as np # Specify the first column as the row index sheet = pd.read_excel(path, index_col=0) sheet['name2'].replace(, 100, inplace=True) sheet.to_excel('')
Open The result is as follows:
summarize
to this article on python pandas processing excel table data commonly used methods are introduced to this article, more related pandas processing excel data content please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!