Agreement:
import pandas as pd import numpy as np from numpy import nan as NaN
Filter missing data
One of the design goals of pandas is to make the task of handling missing data easier. pandas uses NaN as a tag for missing data.
Using dropna makes filtering out missing data more convenient.
1. Process Series objects
Filter missing data by dropna():
se1=([4,NaN,8,NaN,5]) print(se1) ()
Code results:
0 4.0
1 NaN
2 8.0
3 NaN
4 5.0
dtype: float640 4.0
2 8.0
4 5.0
dtype: float64
It can also be filtered through Boolean sequences:
se1[()]
Code results:
0 4.0
2 8.0
4 5.0
dtype: float64
2. Processing DataFrame objects
Handling DataFrame objects is complicated because you may need to discard all NaNs or some of the NaNs.
df1=([[1,2,3],[NaN,NaN,2],[NaN,NaN,NaN],[8,8,NaN]]) df1
Code results:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | NaN | NaN | 2.0 |
2 | NaN | NaN | NaN |
3 | 8.0 | 8.0 | NaN |
By default, all NaNs are filtered out:
()
Code results:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
Pass **how='all'** to filter out all NaN lines:
(how='all')
Code results:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | NaN | NaN | 2.0 |
3 | 8.0 | 8.0 | NaN |
Pass in axis=1 to filter columns:
df1[3]=NaN df1
Code results:
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 1.0 | 2.0 | 3.0 | NaN |
1 | NaN | NaN | 2.0 | NaN |
2 | NaN | NaN | NaN | NaN |
3 | 8.0 | 8.0 | NaN | NaN |
(axis=1,how="all")
Code results:
Passing in thresh=n retains at least n non-NaN data:
(thresh=1) (thresh=3)
Code results:
0 | 1 | 2 | |
---|---|---|---|
0 | 1.0 | 2.0 | 3.0 |
1 | NaN | NaN | 2.0 |
2 | NaN | NaN | NaN |
3 | 8.0 | 8.0 | NaN |
The above is all the content of this article. I hope it will be helpful to everyone's study and I hope everyone will support me more.