Rows and columns deleted using the drop() method.
Prior to version 0.21.0, use the parameters labels and axis to specify rows and columns. Starting with 0.21.0, you can use index or columns.
The following will be explained here.
- DataFrame Specified Row Deletion
- Specify by line name (line label)
- Specify by line number
- Notes on unset row names
- DataFrame Specified Column Deletion
- Specify by column name (column labeling)
- Specify by column number
- Deletion of multiple rows and columns
For removing missing value NaN and removing rows with duplicate elements, please refer to the article.
Pandas removes, replaces and extracts the missing values in it NaN(dropna,fillna,isnull)
The following data is used as an example in the sample code.
import pandas as pd df = pd.read_csv('./data/12/sample_pandas_normal.csv', index_col=0) print(df) # age state point # name # Alice 24 NY 64 # Bob 42 CA 92 # Charlie 18 CA 70 # Dave 68 TX 70 # Ellen 24 CA 88 # Frank 30 NY 57
DataFrame Specified Row Deletion
Specify by line name (line label)
It is specified by the first argument labels and the second argument axis. The line specifies axis= 0.
print(('Charlie', axis=0)) # age state point # name # Alice 24 NY 64 # Bob 42 CA 92 # Dave 68 TX 70 # Ellen 24 CA 88 # Frank 30 NY 57
The default value is axis = 0, so axis can be omitted.
print(('Charlie')) # age state point # name # Alice 24 NY 64 # Bob 42 CA 92 # Dave 68 TX 70 # Ellen 24 CA 88 # Frank 30 NY 57
Starting with version 0.21.0 or later, it can also be specified by the parameter index.
print((index='Charlie')) # age state point # name # Alice 24 NY 64 # Bob 42 CA 92 # Dave 68 TX 70 # Ellen 24 CA 88 # Frank 30 NY 57
If you want to delete more than one line at a time, specify it in the list.
print((['Bob', 'Dave', 'Frank'])) # age state point # name # Alice 24 NY 64 # Charlie 18 CA 70 # Ellen 24 CA 88 print((index=['Bob', 'Dave', 'Frank'])) # age state point # name # Alice 24 NY 64 # Charlie 18 CA 70 # Ellen 24 CA 88
By default, the original DataFrame remains unchanged and a new DataFrame is returned. if the parameter inplace is set to True, the original DataFrame is changed, in which case no new DataFrame is returned and the return value is None.
Specify by line number
To specify by row number, use the index property of the DataFrame.
If you specify the line number in [] of the index attribute, you can get the corresponding line name. Multiple line numbers can be specified in the list.
print([[1, 3, 5]]) # Index(['Bob', 'Dave', 'Frank'], dtype='object', name='name')
Specify the name of the labels or index in the first argument of drop().
print(([[1, 3, 5]])) # age state point # name # Alice 24 NY 64 # Charlie 18 CA 70 # Ellen 24 CA 88 print((index=[[1, 3, 5]])) # age state point # name # Alice 24 NY 64 # Charlie 18 CA 70 # Ellen 24 CA 88
Notes on unset row names
If no line name is set, index defaults to an integer ordinal number. Be careful when using a numeric value as an index instead of such a string.
df_noindex = pd.read_csv('./data/12/sample_pandas_normal.csv') print(df_noindex) # name age state point # 0 Alice 24 NY 64 # 1 Bob 42 CA 92 # 2 Charlie 18 CA 70 # 3 Dave 68 TX 70 # 4 Ellen 24 CA 88 # 5 Frank 30 NY 57 print(df_noindex.index) # RangeIndex(start=0, stop=6, step=1)
If it is a sequence number, the result will be the same whether you specify a numeric value as is or use the index attribute.
print(df_noindex.drop([1, 3, 5])) # name age state point # 0 Alice 24 NY 64 # 2 Charlie 18 CA 70 # 4 Ellen 24 CA 88 print(df_noindex.drop(df_noindex.index[[1, 3, 5]])) # name age state point # 0 Alice 24 NY 64 # 2 Charlie 18 CA 70 # 4 Ellen 24 CA 88
If its not a sequence number due to sorting, the result will be different. When a numeric value is specified directly, the rows whose row labels are that numeric value will be deleted, while when the index attribute is used, the rows whose row numbers are that numeric value will be deleted.
df_noindex_sort = df_noindex.sort_values('state') print(df_noindex_sort) # name age state point # 1 Bob 42 CA 92 # 2 Charlie 18 CA 70 # 4 Ellen 24 CA 88 # 0 Alice 24 NY 64 # 5 Frank 30 NY 57 # 3 Dave 68 TX 70 print(df_noindex_sort.index) # Int64Index([1, 2, 4, 0, 5, 3], dtype='int64') print(df_noindex_sort.drop([1, 3, 5])) # name age state point # 2 Charlie 18 CA 70 # 4 Ellen 24 CA 88 # 0 Alice 24 NY 64 print(df_noindex_sort.drop(df_noindex_sort.index[[1, 3, 5]])) # name age state point # 1 Bob 42 CA 92 # 4 Ellen 24 CA 88 # 5 Frank 30 NY 57
DataFrame Specified Column Deletion
Specify by column name (column labeling)
It is specified by the first argument labels and the second argument axis. Columns specifies axis= 1.
print(('state', axis=1)) # age point # name # Alice 24 64 # Bob 42 92 # Charlie 18 70 # Dave 68 70 # Ellen 24 88 # Frank 30 57
Starting with version 0.21.0 or later, it can be specified using the parameter column.
print((columns='state')) # age point # name # Alice 24 64 # Bob 42 92 # Charlie 18 70 # Dave 68 70 # Ellen 24 88 # Frank 30 57
If you want to delete more than one column at a time, specify it in the list.
print((['state', 'point'], axis=1)) # age # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30 print((columns=['state', 'point'])) # age # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30
The argument inplace is used in the same way as for lines.
df_org = () df_org.drop(columns=['state', 'point'], inplace=True) print(df_org) # age # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30
Specify by column number
To specify by column number, use the columns property of the DataFrame.
print([[1, 2]]) # Index(['state', 'point'], dtype='object') print(([[1, 2]], axis=1)) # age # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30 print((columns=[[1, 2]])) # age # name # Alice 24 # Bob 42 # Charlie 18 # Dave 68 # Ellen 24 # Frank 30
If columns is an integer value, be careful with the above line.
Deletion of multiple rows and columns
As of version 0.21.0 and higher, multiple rows/multiple columns can be deleted by specifying the parameters index and column at the same time.
Of course, it can also be specified by row/column number, and by using the parameter inplace.
print((index=['Bob', 'Dave', 'Frank'], columns=['state', 'point'])) # age # name # Alice 24 # Charlie 18 # Ellen 24 print((index=[[1, 3, 5]], columns=[[1, 2]])) # age # name # Alice 24 # Charlie 18 # Ellen 24
to this article on the deletion of the specified rows and columns (drop) of the implementation of the article is introduced to this, more related Pandas DataFrame to delete the specified rows and columns of content, please search for my previous posts or continue to browse the following related articles I hope that you will support me in the future more!