pandas: Detect and count NaN (missing values) with isnull(), isna()
This article describes how to check if pandas.DataFrame and pandas.Series contain NaN and count the number of NaN. You can use the isnull() and isna() methods. It should be noted, however, that the isnan() method is not provided.
- Detect
NaNwithisnull()andisna() - Check if all elements in a row and column are
NaN - Check if a row and column contains at least one
NaN - Count
NaNin each row and column - Count non-missing values in each row and column
- Count the total number of
NaN - Count the total number of non-missing values
- Check if
pandas.DataFramecontains at least oneNaN - For
pandas.Series
While this article primarily deals with NaN (Not a Number), it's important to note that in pandas, None is also treated as a missing value.
See the following articles on how to remove and replace missing values.
- pandas: Remove NaN (missing values) with dropna()
- pandas: Replace NaN (missing values) with fillna()
See the following articles on how to count elements that meet certain conditions, not just NaN.
The sample code in this article uses pandas version 2.0.3. As an example, read a CSV file with missing values and use the first three rows.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')[:3]
print(df)
# name age state point other
# 0 Alice 24.0 NY NaN NaN
# 1 NaN NaN NaN NaN NaN
# 2 Charlie NaN CA NaN NaN
Detect NaN with isnull() and isna()
The isnull() and isna() methods are available in both DataFrame and Series. Note that the isnan() method is not provided. Examples using Series are provided later.
- pandas.DataFrame.isnull — pandas 2.0.3 documentation
- pandas.DataFrame.isna — pandas 2.0.3 documentation
These methods return True for missing values and False for non-missing values.
print(df.isnull())
# name age state point other
# 0 False False False True True
# 1 True True True True True
# 2 False True False True True
print(df.isna())
# name age state point other
# 0 False False False True True
# 1 True True True True True
# 2 False True False True True
isnull() is an alias for isna(), and both are used interchangeably. isnull() is mainly used in this article, but you can replace it with isna().
notnull() and notna() are also provided. These return True if the value is not missing, and False if the value is missing. notnull() is an alias for notna().
- pandas.DataFrame.notnull — pandas 2.0.3 documentation
- pandas.DataFrame.notna — pandas 2.0.3 documentation
print(df.notnull())
# name age state point other
# 0 True True True False False
# 1 False False False False False
# 2 True False True False False
print(df.notna())
# name age state point other
# 0 True True True False False
# 1 False False False False False
# 2 True False True False False
Note that comparing NaN with any value using == always returns False, whereas != returns True.
print(df == float('nan'))
# name age state point other
# 0 False False False False False
# 1 False False False False False
# 2 False False False False False
print(df != float('nan'))
# name age state point other
# 0 True True True True True
# 1 True True True True True
# 2 True True True True True
Check if all elements in a row and column are NaN
all() returns True if all elements in each row and column are True.
By calling all() on the result of isnull(), you can check if all the elements in each row and column are NaN.
By default, it is applied to columns. If axis=1, it is applied to rows.
print(df.isnull().all())
# name False
# age False
# state False
# point True
# other True
# dtype: bool
print(df.isnull().all(axis=1))
# 0 False
# 1 True
# 2 False
# dtype: bool
Check if a row and column contains at least one NaN
any() returns True if there is at least one True in each row and column.
By calling any() on the result of isnull(), you can check if each row and column contains at least one NaN.
By default, it is applied to columns. If axis=1, it is applied to rows.
print(df.isnull().any())
# name True
# age True
# state True
# point True
# other True
# dtype: bool
print(df.isnull().any(axis=1))
# 0 True
# 1 True
# 2 True
# dtype: bool
Count NaN in each row and column
sum() calculates the sum of elements for each row and column.
Since sum() calculates as True=1 and False=0, you can count the number of NaN in each row and column by calling sum() on the result of isnull().
You can count NaN in each column by default, and in each row with axis=1.
print(df.isnull().sum())
# name 1
# age 2
# state 1
# point 3
# other 3
# dtype: int64
print(df.isnull().sum(axis=1))
# 0 2
# 1 5
# 2 3
# dtype: int64
Count non-missing values in each row and column
count() counts the number of non-missing values (= existing values) in each row and column.
Call it directly on the original DataFrame, not the result of isnull().
You can count non-missing values in each column by default, and in each row with axis=1.
print(df.count())
# name 2
# age 1
# state 2
# point 0
# other 0
# dtype: int64
print(df.count(axis=1))
# 0 3
# 1 0
# 2 2
# dtype: int64
Count the total number of NaN
You can get the whole data as a NumPy array numpy.ndarray with the values attribute of pandas.DataFrame.
print(df.isnull().values)
# [[False False False True True]
# [ True True True True True]
# [False True False True True]]
print(type(df.isnull().values))
# <class 'numpy.ndarray'>
Unlike pandas.DataFrame, sum() of numpy.ndarray calculates the sum of all elements across dimensions by default.
Therefore, by calling sum() on the values attribute (numpy.ndarray) of the result of isnull(), you can get the total number of NaN.
print(df.isnull().values.sum())
# 10
Count the total number of non-missing values
You can get the total number of non-missing elements by summing the number of each row and column obtained by count() with sum().
print(df.count().sum())
# 5
You can also call sum() on the values attribute (numpy.ndarray) of the result of notnull() or notna() (where the non-missing element is True).
print(df.notnull().values.sum())
# 5
Check if pandas.DataFrame contains at least one NaN
Using the total number of NaN shown above, you can check if DataFrame contains at least one NaN.
If the total number of NaN is not zero, it means DataFrame contains at least one NaN.
print(df.isnull().values.sum() != 0)
# True
If the total number of NaN equals the size attribute (the number of all elements), it means all elements are NaN.
print(df.size)
# 15
print(df.isnull().values.sum() == df.size)
# False
For pandas.Series
pandas.Series also has methods such as isnull(), isna(), notnull(), and notna(). It can be handled in the same way as the above examples of pandas.DataFrame.
- pandas.Series.isnull — pandas 2.0.3 documentation
- pandas.Series.isna — pandas 2.0.3 documentation
- pandas.Series.notnull — pandas 2.0.3 documentation
- pandas.Series.notna — pandas 2.0.3 documentation
s = df['state']
print(s)
# 0 NY
# 1 NaN
# 2 CA
# Name: state, dtype: object
print(s.isnull())
# 0 False
# 1 True
# 2 False
# Name: state, dtype: bool
print(s.notnull())
# 0 True
# 1 False
# 2 True
# Name: state, dtype: bool
print(s.isnull().any())
# True
print(s.isnull().all())
# False
print(s.isnull().sum())
# 1
print(s.count())
# 2