pandas: Find rows/columns with NaN (missing values)
You can find rows/columns containing NaN
in pandas.DataFrame
using the isnull()
or isna()
method that checks if an element is a missing value.
While this article primarily deals with NaN
(Not a Number), it's important to note that in pandas, None
is also treated as a missing value.
Use the dropna()
method to retain rows/columns where all elements are non-missing values, i.e., remove rows/columns containing missing values.
The sample code in this article uses pandas version 2.0.3
. As an example, read a CSV file with missing values.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')
print(df)
# name age state point other
# 0 Alice 24.0 NY NaN NaN
# 1 NaN NaN NaN NaN NaN
# 2 Charlie NaN CA NaN NaN
# 3 Dave 68.0 TX 70.0 NaN
# 4 Ellen NaN CA 88.0 NaN
# 5 Frank 30.0 NaN NaN NaN
Find rows/columns with NaN
in specific columns/rows
You can use the isnull()
or isna()
method of pandas.DataFrame
and Series
to check if each element is a missing value or not.
print(df.isnull())
# name age state point other
# 0 False False False True True
# 1 True True True True True
# 2 False True False True True
# 3 False False False False True
# 4 False True False False True
# 5 False False True True True
isnull()
is an alias for isna()
, and both are used interchangeably. isnull()
is mainly used in this article, but you can replace it with isna()
.
If you want to find rows with NaN
in a specific column, use the result of isnull()
for that column.
print(df['point'].isnull())
# 0 True
# 1 True
# 2 True
# 3 False
# 4 False
# 5 True
# Name: point, dtype: bool
print(df[df['point'].isnull()])
# name age state point other
# 0 Alice 24.0 NY NaN NaN
# 1 NaN NaN NaN NaN NaN
# 2 Charlie NaN CA NaN NaN
# 5 Frank 30.0 NaN NaN NaN
The concept is the same when finding columns with NaN
in a specific row. Use loc[]
to select by name (label), and iloc[]
to select by position.
print(df.iloc[2].isnull())
# name False
# age True
# state False
# point True
# other True
# Name: 2, dtype: bool
print(df.loc[:, df.iloc[2].isnull()])
# age point other
# 0 24.0 NaN NaN
# 1 NaN NaN NaN
# 2 NaN NaN NaN
# 3 68.0 70.0 NaN
# 4 NaN 88.0 NaN
# 5 30.0 NaN NaN
Find rows/columns with at least one NaN
To use as an example, remove rows and columns where all values are NaN
.
df2 = df.dropna(how='all').dropna(how='all', axis=1)
print(df2)
# name age state point
# 0 Alice 24.0 NY NaN
# 2 Charlie NaN CA NaN
# 3 Dave 68.0 TX 70.0
# 4 Ellen NaN CA 88.0
# 5 Frank 30.0 NaN NaN
Use the any()
method that returns True
if there is at least one True
in each row/column. By default, it is applied to columns. If axis=1
, it is applied to rows.
By calling any()
on the result of isnull()
, you can check if each row and column contains at least one NaN
.
Find rows that contain at least one NaN
:
print(df2.isnull())
# name age state point
# 0 False False False True
# 2 False True False True
# 3 False False False False
# 4 False True False False
# 5 False False True True
print(df2.isnull().any(axis=1))
# 0 True
# 2 True
# 3 False
# 4 True
# 5 True
# dtype: bool
print(df2[df2.isnull().any(axis=1)])
# name age state point
# 0 Alice 24.0 NY NaN
# 2 Charlie NaN CA NaN
# 4 Ellen NaN CA 88.0
# 5 Frank 30.0 NaN NaN
Find columns that contain at least one NaN
:
print(df2.isnull().any())
# name False
# age True
# state True
# point True
# dtype: bool
print(df2.loc[:, df2.isnull().any()])
# age state point
# 0 24.0 NY NaN
# 2 NaN CA NaN
# 3 68.0 TX 70.0
# 4 NaN CA 88.0
# 5 30.0 NaN NaN