pandas: Detect and count NaN (missing values) with isnull(), isna()
This article describes how to check if pandas.DataFrame
and pandas.Series
contain NaN
and count the number of NaN
. You can use the isnull()
and isna()
methods. It should be noted, however, that the isnan()
method is not provided.
- Detect
NaN
withisnull()
andisna()
- Check if all elements in a row and column are
NaN
- Check if a row and column contains at least one
NaN
- Count
NaN
in each row and column - Count non-missing values in each row and column
- Count the total number of
NaN
- Count the total number of non-missing values
- Check if
pandas.DataFrame
contains at least oneNaN
- For
pandas.Series
While this article primarily deals with NaN
(Not a Number), it's important to note that in pandas, None
is also treated as a missing value.
See the following articles on how to remove and replace missing values.
- pandas: Remove NaN (missing values) with dropna()
- pandas: Replace NaN (missing values) with fillna()
See the following articles on how to count elements that meet certain conditions, not just NaN
.
The sample code in this article uses pandas version 2.0.3
. As an example, read a CSV file with missing values and use the first three rows.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')[:3]
print(df)
# name age state point other
# 0 Alice 24.0 NY NaN NaN
# 1 NaN NaN NaN NaN NaN
# 2 Charlie NaN CA NaN NaN
Detect NaN
with isnull()
and isna()
The isnull()
and isna()
methods are available in both DataFrame
and Series
. Note that the isnan()
method is not provided. Examples using Series
are provided later.
- pandas.DataFrame.isnull — pandas 2.0.3 documentation
- pandas.DataFrame.isna — pandas 2.0.3 documentation
These methods return True
for missing values and False
for non-missing values.
print(df.isnull())
# name age state point other
# 0 False False False True True
# 1 True True True True True
# 2 False True False True True
print(df.isna())
# name age state point other
# 0 False False False True True
# 1 True True True True True
# 2 False True False True True
isnull()
is an alias for isna()
, and both are used interchangeably. isnull()
is mainly used in this article, but you can replace it with isna()
.
notnull()
and notna()
are also provided. These return True
if the value is not missing, and False
if the value is missing. notnull()
is an alias for notna()
.
- pandas.DataFrame.notnull — pandas 2.0.3 documentation
- pandas.DataFrame.notna — pandas 2.0.3 documentation
print(df.notnull())
# name age state point other
# 0 True True True False False
# 1 False False False False False
# 2 True False True False False
print(df.notna())
# name age state point other
# 0 True True True False False
# 1 False False False False False
# 2 True False True False False
Note that comparing NaN
with any value using ==
always returns False
, whereas !=
returns True
.
print(df == float('nan'))
# name age state point other
# 0 False False False False False
# 1 False False False False False
# 2 False False False False False
print(df != float('nan'))
# name age state point other
# 0 True True True True True
# 1 True True True True True
# 2 True True True True True
Check if all elements in a row and column are NaN
all()
returns True
if all elements in each row and column are True
.
By calling all()
on the result of isnull()
, you can check if all the elements in each row and column are NaN
.
By default, it is applied to columns. If axis=1
, it is applied to rows.
print(df.isnull().all())
# name False
# age False
# state False
# point True
# other True
# dtype: bool
print(df.isnull().all(axis=1))
# 0 False
# 1 True
# 2 False
# dtype: bool
Check if a row and column contains at least one NaN
any()
returns True
if there is at least one True
in each row and column.
By calling any()
on the result of isnull()
, you can check if each row and column contains at least one NaN
.
By default, it is applied to columns. If axis=1
, it is applied to rows.
print(df.isnull().any())
# name True
# age True
# state True
# point True
# other True
# dtype: bool
print(df.isnull().any(axis=1))
# 0 True
# 1 True
# 2 True
# dtype: bool
Count NaN
in each row and column
sum()
calculates the sum of elements for each row and column.
Since sum()
calculates as True=1
and False=0
, you can count the number of NaN
in each row and column by calling sum()
on the result of isnull()
.
You can count NaN
in each column by default, and in each row with axis=1
.
print(df.isnull().sum())
# name 1
# age 2
# state 1
# point 3
# other 3
# dtype: int64
print(df.isnull().sum(axis=1))
# 0 2
# 1 5
# 2 3
# dtype: int64
Count non-missing values in each row and column
count()
counts the number of non-missing values (= existing values) in each row and column.
Call it directly on the original DataFrame
, not the result of isnull()
.
You can count non-missing values in each column by default, and in each row with axis=1
.
print(df.count())
# name 2
# age 1
# state 2
# point 0
# other 0
# dtype: int64
print(df.count(axis=1))
# 0 3
# 1 0
# 2 2
# dtype: int64
Count the total number of NaN
You can get the whole data as a NumPy array numpy.ndarray
with the values
attribute of pandas.DataFrame
.
print(df.isnull().values)
# [[False False False True True]
# [ True True True True True]
# [False True False True True]]
print(type(df.isnull().values))
# <class 'numpy.ndarray'>
Unlike pandas.DataFrame
, sum()
of numpy.ndarray
calculates the sum of all elements across dimensions by default.
Therefore, by calling sum()
on the values
attribute (numpy.ndarray
) of the result of isnull()
, you can get the total number of NaN
.
print(df.isnull().values.sum())
# 10
Count the total number of non-missing values
You can get the total number of non-missing elements by summing the number of each row and column obtained by count()
with sum()
.
print(df.count().sum())
# 5
You can also call sum()
on the values
attribute (numpy.ndarray
) of the result of notnull()
or notna()
(where the non-missing element is True
).
print(df.notnull().values.sum())
# 5
Check if pandas.DataFrame
contains at least one NaN
Using the total number of NaN
shown above, you can check if DataFrame
contains at least one NaN
.
If the total number of NaN
is not zero, it means DataFrame
contains at least one NaN
.
print(df.isnull().values.sum() != 0)
# True
If the total number of NaN
equals the size
attribute (the number of all elements), it means all elements are NaN
.
print(df.size)
# 15
print(df.isnull().values.sum() == df.size)
# False
For pandas.Series
pandas.Series
also has methods such as isnull()
, isna()
, notnull()
, and notna()
. It can be handled in the same way as the above examples of pandas.DataFrame
.
- pandas.Series.isnull — pandas 2.0.3 documentation
- pandas.Series.isna — pandas 2.0.3 documentation
- pandas.Series.notnull — pandas 2.0.3 documentation
- pandas.Series.notna — pandas 2.0.3 documentation
s = df['state']
print(s)
# 0 NY
# 1 NaN
# 2 CA
# Name: state, dtype: object
print(s.isnull())
# 0 False
# 1 True
# 2 False
# Name: state, dtype: bool
print(s.notnull())
# 0 True
# 1 False
# 2 True
# Name: state, dtype: bool
print(s.isnull().any())
# True
print(s.isnull().all())
# False
print(s.isnull().sum())
# 1
print(s.count())
# 2