python pandas processing excel table data common method summary

preamble

Recently teaching assistant to change the results of homework exported form with the teacher to the list of order is not consistent, the brain shell a light on the use of pandas to write a script to automatically bar the original export results transcribed to the teacher to the list of ha ha ha, here on the record used pandas to deal with excel in a common way. (Note: only applies to .xlsx type of file)

1, read xlsx table: pd.read_excel ()

The original content is below:

a) Read the data of the nth Sheet (sub-sheet, you can view or add or delete sub-sheets at the bottom left)

import pandas as pd
# Paths that need to be changed every time
path = ""
# sheet_name defaults to 0, i.e., reads the data from the first sheet
sheet = pd.read_excel(path, sheet_name=0)
print(sheet)
"""
  Unnamed: 0  name1  name2  name3
0       row1      1    2.0      3
1       row2      4    NaN      6
2       row3      7    8.0      9
"""

It can be noted that the top left corner of the original form is not filled in and reads "Unnamed: 0", this is because the read_excel function willBy default, the first line of the table is the column index name. Also, for row index names, the default numbering starts from the second row (since the default first row is the column index name, the default first row is not the data), and if you do not specify it specifically, the numbering automatically starts from 0, as follows.

sheet = pd.read_excel(path)
# View column index names, return list form
print()
# View line index name, the default numbering from the second line, if not specified, the automatic numbering from 0, return list form
print()
"""
['Unnamed: 0' 'name1' 'name2' 'name3']
[0 1 2]
"""

b) Column index nameIt can also be customized as follows:

sheet = pd.read_excel(path, names=['col1', 'col2', 'col3', 'col4'])
print(sheet)
# View column index names, return list form
print()
"""
   col1  col2  col3  col4
0  row1     1   2.0     3
1  row2     4   NaN     6
2  row3     7   8.0     9
['col1' 'col2' 'col3' 'col4']
"""

c) You can also specify the nth column as the row index name., as follows:

# Specify the first column as the row index
sheet = pd.read_excel(path, index_col=0)
print(sheet)
"""
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
row3      7    8.0      9
"""

d) Skip the nth line of data when reading

# Skip data in row 2 (first row indexed to 0)
sheet = pd.read_excel(path, skiprows=[1])
print(sheet)
"""
  Unnamed: 0  name1  name2  name3
0       row2      4    NaN      6
1       row3      7    8.0      9
"""

2, get the data size of the form: shape

path = ""
# Specify the first column as the row index
sheet = pd.read_excel(path, index_col=0)
print(sheet)
print('==========================')
print('shape of sheet:', )
"""
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
row3      7    8.0      9
==========================
shape of sheet: (3, 3)
"""

3. Methods of indexing data: [ ] / loc[] / iloc[]

1. Direct square bracket indexing

You can use square brackets with column names[col_name] to extract the data in a column, and then use square brackets with the index number[index] to index the value of this column at a specific location. Here we index the column named name1 and print the data located in row 1 (indexed by 1) of the column: 4, as follows:

sheet = pd.read_excel(path)
# Retrieve the column with the name1 name.
col = sheet['name1']
print(col)
# Print the second data in the column
print(col[1]) # 4
"""
0    1
1    4
2    7
Name: name1, dtype: int64
4
"""

2. iloc method, indexed by integer numbering

utilization[ ] index, the square brackets for the rows and columns of the integer position numbering (after removing the row as the index of the rows and as the index of the columns of which rows, starting from 0).
a）[1, 2] : ExtractionRow 2, column 3Data. The first is a row index and the second is a column index

b）[0: 2] : Extractionfirst two linesdigital

c）[0:2, 0:2] : Extracted by slicingfirst two lines (used form a nominal expression)first two columns digital

# Specify the first column of data as a row index
sheet = pd.read_excel(path, index_col=0)
# Read column 3 (6) of row 2 (row2)
# The first is a row index, the second is a column index #
data = [1, 2]
print(data)  # 6
print('================================')
# Extract the first two rows of data by slicing and dicing #
data_slice = [0:2]
print(data_slice)
print('================================')
# Extract the first two columns of the first two rows by slicing the first two columns #
data_slice = [0:2, 0:2]
print(data_slice)
"""
6
================================
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
================================
      name1  name2
row1      1    2.0
row2      4    NaN
"""

3. loc method, indexed by row and column name

utilization[ ] Indexed, in square bracketsThe name string of the row or column. It is used in the same way asiloc This is just replacing the integer index of iloc with the name index of the rows and columns. This type of indexing is more intuitive to use.

take note of：iloc[1: 2] is free of 2, butloc['row1': 'row2'] is the one containing 'row2'.

# Specify the first column of data as a row index
sheet = pd.read_excel(path, index_col=0)
# Read column 3 (6) of row 2 (row2)
# The first is a row index, the second is a column index #
data = ['row2', 'name3']
print(data)  # 1
print('================================')
# Extract the first two rows of data by slicing them up #
data_slice = ['row1': 'row2']
print(data_slice)
print('================================')
# Extract the first two columns of the first two rows by slicing the first two columns #
data_slice1 = ['row1': 'row2', 'name1': 'name2']
print(data_slice1)
"""
6
================================
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
================================
      name1  name2
row1      1    2.0
row2      4    NaN
"""

4, to determine the data is empty: () / ()

1. UseThe numpy library's isnan() maybe The pandas library's isnull()method to determine if it is equal tonan 。

sheet = pd.read_excel(path)
# Retrieve the column with the name1 name.
col = sheet['name2']
 
print((col[1]))  # True
print((col[1]))  # True
"""
True
True
"""

2、Use str() to convert to a string, determine if it is equal to'nan' 。

sheet = pd.read_excel(path)
# Retrieve the column with the name1 name.
col = sheet['name2']
print(col)
# Print the second data in the column
if str(col[1]) == 'nan':
    print('col[1] is nan')
"""
0    2.0
1    NaN
2    8.0
Name: name2, dtype: float64
col[1] is nan
"""

5. Finding eligible data

Here's the code to get the idea

# Extract lines with name1 == 1
mask = (sheet['name1'] == 1)
x = [mask]
print(x)
"""
      name1  name2  name3
row1      1    2.0      3
"""

6, modify the value of the element: replace ()

sheet['name2'].replace(2, 100, inplace=True) : Change element 2 of column name2 to element 100, in-place.

sheet['name2'].replace(2, 100, inplace=True)
print(sheet)
"""
      name1  name2  name3
row1      1  100.0      3
row2      4    NaN      6
row3      7    8.0      9
"""

sheet['name2'].replace(, 100, inplace=True) : Change the empty element (nan) of column name2 to element 100, in-place.

import numpy as np 
sheet['name2'].replace(, 100, inplace=True)
print(sheet)
print(type(['row2', 'name2']))
"""
      name1  name2  name3
row1      1    2.0      3
row2      4  100.0      6
row3      7    8.0      9
"""

7. Add data: [ ]

To add a column, add it directly using the center bracket [ name to be added ].

sheet['name_add'] = [55, 66, 77] : Add a column named name_add with values [55, 66, 77].

path = ""
# Specify the first column as the row index
sheet = pd.read_excel(path, index_col=0)
print(sheet)
print('====================================')
# Add a column named name_add with values [55, 66, 77].
sheet['name_add'] = [55, 66, 77]
print(sheet)
"""
      name1  name2  name3
row1      1    2.0      3
row2      4    NaN      6
row3      7    8.0      9
====================================
      name1  name2  name3  name_add
row1      1    2.0      3        55
row2      4    NaN      6        66
row3      7    8.0      9        77
"""

8, delete data: del () / drop ()

a）del(sheet['name3']) : Delete using the del method

sheet = pd.read_excel(path, index_col=0)
# Delete the column 'name3' using the del method
del(sheet['name3'])
print(sheet)
"""
      name1  name2
row1      1    2.0
row2      4    NaN
row3      7    8.0
"""

b）('row1', axis=0)

Use the drop method to delete row1, with axis=1 if the column is deleted.

When the inplace parameter is True, it will not return the parameter and will remove it directly from the original data.

When the inplace parameter is False (the default), the original data is not modified, but the modified data is returned.

('row1', axis=0, inplace=True)
print(sheet)
"""
      name1  name2  name3
row2      4    NaN      6
row3      7    8.0      9
"""

c）(labels=['name1', 'name2'], axis=1)

Use the label=[ ] parameter to delete multiple rows or columns.

# Delete multiple columns, the default inplace parameter is False, i.e., the result will be returned.
print((labels=['name1', 'name2'], axis=1))
"""
      name3
row1      3
row2      6
row3      9
"""

9, save to excel file: to_excel ()

1. Save the data in pandas format as an .xlsx file.

names = ['a', 'b', 'c']
scores = [99, 100, 99]
result_excel = ()
result_excel["Name"] = names
result_excel["Scoring."] = scores
# Write to excel
result_excel.to_excel('')

2, to change the good excel file save as .xlsx file.

For example, save the file after changing nan to 100 in the original table:

import numpy as np 
# Specify the first column as the row index
sheet = pd.read_excel(path, index_col=0)
sheet['name2'].replace(, 100, inplace=True)
sheet.to_excel('')

Open The result is as follows:

summarize

to this article on python pandas processing excel table data commonly used methods are introduced to this article, more related pandas processing excel data content please search for my previous articles or continue to browse the following related articles I hope that you will support me in the future more!