SoFunction
Updated on 2025-03-01

Pandas's implementation method of filtering out missing data

Agreement:

import pandas as pd
import numpy as np
from numpy import nan as NaN

Filter missing data

One of the design goals of pandas is to make the task of handling missing data easier. pandas uses NaN as a tag for missing data.

Using dropna makes filtering out missing data more convenient.

1. Process Series objects

Filter missing data by dropna():

se1=([4,NaN,8,NaN,5])
print(se1)
()

Code results:

0    4.0
1    NaN
2    8.0
3    NaN
4    5.0
dtype: float64

0    4.0
2    8.0
4    5.0
dtype: float64

It can also be filtered through Boolean sequences:

se1[()]

Code results:

0    4.0
2    8.0
4    5.0
dtype: float64

2. Processing DataFrame objects

Handling DataFrame objects is complicated because you may need to discard all NaNs or some of the NaNs.

df1=([[1,2,3],[NaN,NaN,2],[NaN,NaN,NaN],[8,8,NaN]])
df1

Code results:

0 1 2
0 1.0 2.0 3.0
1 NaN NaN 2.0
2 NaN NaN NaN
3 8.0 8.0 NaN

By default, all NaNs are filtered out:

()

Code results:

0 1 2
0 1.0 2.0 3.0

Pass **how='all'** to filter out all NaN lines:

(how='all')

Code results:

0 1 2
0 1.0 2.0 3.0
1 NaN NaN 2.0
3 8.0 8.0 NaN

Pass in axis=1 to filter columns:

df1[3]=NaN
df1

Code results:

0 1 2 3
0 1.0 2.0 3.0 NaN
1 NaN NaN 2.0 NaN
2 NaN NaN NaN NaN
3 8.0 8.0 NaN NaN
(axis=1,how="all")

Code results:

Passing in thresh=n retains at least n non-NaN data:

(thresh=1)

(thresh=3)

Code results:

0 1 2
0 1.0 2.0 3.0
1 NaN NaN 2.0
2 NaN NaN NaN
3 8.0 8.0 NaN

The above is all the content of this article. I hope it will be helpful to everyone's study and I hope everyone will support me more.