pandas: Find rows/columns with NaN (missing values)
You can find rows/columns containing NaN in pandas.DataFrame using the isnull() or isna() method that checks if an element is a missing value.
While this article primarily deals with NaN (Not a Number), it's important to note that in pandas, None is also treated as a missing value.
Use the dropna() method to retain rows/columns where all elements are non-missing values, i.e., remove rows/columns containing missing values.
The sample code in this article uses pandas version 2.0.3. As an example, read a CSV file with missing values.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')
print(df)
# name age state point other
# 0 Alice 24.0 NY NaN NaN
# 1 NaN NaN NaN NaN NaN
# 2 Charlie NaN CA NaN NaN
# 3 Dave 68.0 TX 70.0 NaN
# 4 Ellen NaN CA 88.0 NaN
# 5 Frank 30.0 NaN NaN NaN
Find rows/columns with NaN in specific columns/rows
You can use the isnull() or isna() method of pandas.DataFrame and Series to check if each element is a missing value or not.
print(df.isnull())
# name age state point other
# 0 False False False True True
# 1 True True True True True
# 2 False True False True True
# 3 False False False False True
# 4 False True False False True
# 5 False False True True True
isnull() is an alias for isna(), and both are used interchangeably. isnull() is mainly used in this article, but you can replace it with isna().
If you want to find rows with NaN in a specific column, use the result of isnull() for that column.
print(df['point'].isnull())
# 0 True
# 1 True
# 2 True
# 3 False
# 4 False
# 5 True
# Name: point, dtype: bool
print(df[df['point'].isnull()])
# name age state point other
# 0 Alice 24.0 NY NaN NaN
# 1 NaN NaN NaN NaN NaN
# 2 Charlie NaN CA NaN NaN
# 5 Frank 30.0 NaN NaN NaN
The concept is the same when finding columns with NaN in a specific row. Use loc[] to select by name (label), and iloc[] to select by position.
print(df.iloc[2].isnull())
# name False
# age True
# state False
# point True
# other True
# Name: 2, dtype: bool
print(df.loc[:, df.iloc[2].isnull()])
# age point other
# 0 24.0 NaN NaN
# 1 NaN NaN NaN
# 2 NaN NaN NaN
# 3 68.0 70.0 NaN
# 4 NaN 88.0 NaN
# 5 30.0 NaN NaN
Find rows/columns with at least one NaN
To use as an example, remove rows and columns where all values are NaN.
df2 = df.dropna(how='all').dropna(how='all', axis=1)
print(df2)
# name age state point
# 0 Alice 24.0 NY NaN
# 2 Charlie NaN CA NaN
# 3 Dave 68.0 TX 70.0
# 4 Ellen NaN CA 88.0
# 5 Frank 30.0 NaN NaN
Use the any() method that returns True if there is at least one True in each row/column. By default, it is applied to columns. If axis=1, it is applied to rows.
By calling any() on the result of isnull(), you can check if each row and column contains at least one NaN.
Find rows that contain at least one NaN:
print(df2.isnull())
# name age state point
# 0 False False False True
# 2 False True False True
# 3 False False False False
# 4 False True False False
# 5 False False True True
print(df2.isnull().any(axis=1))
# 0 True
# 2 True
# 3 False
# 4 True
# 5 True
# dtype: bool
print(df2[df2.isnull().any(axis=1)])
# name age state point
# 0 Alice 24.0 NY NaN
# 2 Charlie NaN CA NaN
# 4 Ellen NaN CA 88.0
# 5 Frank 30.0 NaN NaN
Find columns that contain at least one NaN:
print(df2.isnull().any())
# name False
# age True
# state True
# point True
# dtype: bool
print(df2.loc[:, df2.isnull().any()])
# age state point
# 0 24.0 NY NaN
# 2 NaN CA NaN
# 3 68.0 TX 70.0
# 4 NaN CA 88.0
# 5 30.0 NaN NaN