pandas: Detect and count NaN (missing values) with isnull(), isna()

Modified: | Tags: Python, pandas

This article describes how to check if pandas.DataFrame and pandas.Series contain NaN and count the number of NaN. You can use the isnull() and isna() methods. It should be noted, however, that the isnan() method is not provided.

While this article primarily deals with NaN (Not a Number), it's important to note that in pandas, None is also treated as a missing value.

See the following articles on how to remove and replace missing values.

See the following articles on how to count elements that meet certain conditions, not just NaN.

The sample code in this article uses pandas version 2.0.3. As an example, read a CSV file with missing values and use the first three rows.

import pandas as pd

print(pd.__version__)
# 2.0.3

df = pd.read_csv('data/src/sample_pandas_normal_nan.csv')[:3]
print(df)
#       name   age state  point  other
# 0    Alice  24.0    NY    NaN    NaN
# 1      NaN   NaN   NaN    NaN    NaN
# 2  Charlie   NaN    CA    NaN    NaN

Detect NaN with isnull() and isna()

The isnull() and isna() methods are available in both DataFrame and Series. Note that the isnan() method is not provided. Examples using Series are provided later.

These methods return True for missing values and False for non-missing values.

print(df.isnull())
#     name    age  state  point  other
# 0  False  False  False   True   True
# 1   True   True   True   True   True
# 2  False   True  False   True   True

print(df.isna())
#     name    age  state  point  other
# 0  False  False  False   True   True
# 1   True   True   True   True   True
# 2  False   True  False   True   True

isnull() is an alias for isna(), and both are used interchangeably. isnull() is mainly used in this article, but you can replace it with isna().

notnull() and notna() are also provided. These return True if the value is not missing, and False if the value is missing. notnull() is an alias for notna().

print(df.notnull())
#     name    age  state  point  other
# 0   True   True   True  False  False
# 1  False  False  False  False  False
# 2   True  False   True  False  False

print(df.notna())
#     name    age  state  point  other
# 0   True   True   True  False  False
# 1  False  False  False  False  False
# 2   True  False   True  False  False

Note that comparing NaN with any value using == always returns False, whereas != returns True.

print(df == float('nan'))
#     name    age  state  point  other
# 0  False  False  False  False  False
# 1  False  False  False  False  False
# 2  False  False  False  False  False

print(df != float('nan'))
#    name   age  state  point  other
# 0  True  True   True   True   True
# 1  True  True   True   True   True
# 2  True  True   True   True   True

Check if all elements in a row and column are NaN

all() returns True if all elements in each row and column are True.

By calling all() on the result of isnull(), you can check if all the elements in each row and column are NaN.

By default, it is applied to columns. If axis=1, it is applied to rows.

print(df.isnull().all())
# name     False
# age      False
# state    False
# point     True
# other     True
# dtype: bool

print(df.isnull().all(axis=1))
# 0    False
# 1     True
# 2    False
# dtype: bool

Check if a row and column contains at least one NaN

any() returns True if there is at least one True in each row and column.

By calling any() on the result of isnull(), you can check if each row and column contains at least one NaN.

By default, it is applied to columns. If axis=1, it is applied to rows.

print(df.isnull().any())
# name     True
# age      True
# state    True
# point    True
# other    True
# dtype: bool

print(df.isnull().any(axis=1))
# 0    True
# 1    True
# 2    True
# dtype: bool

Count NaN in each row and column

sum() calculates the sum of elements for each row and column.

Since sum() calculates as True=1 and False=0, you can count the number of NaN in each row and column by calling sum() on the result of isnull().

You can count NaN in each column by default, and in each row with axis=1.

print(df.isnull().sum())
# name     1
# age      2
# state    1
# point    3
# other    3
# dtype: int64

print(df.isnull().sum(axis=1))
# 0    2
# 1    5
# 2    3
# dtype: int64

Count non-missing values in each row and column

count() counts the number of non-missing values (= existing values) in each row and column.

Call it directly on the original DataFrame, not the result of isnull().

You can count non-missing values in each column by default, and in each row with axis=1.

print(df.count())
# name     2
# age      1
# state    2
# point    0
# other    0
# dtype: int64

print(df.count(axis=1))
# 0    3
# 1    0
# 2    2
# dtype: int64

Count the total number of NaN

You can get the whole data as a NumPy array numpy.ndarray with the values attribute of pandas.DataFrame.

print(df.isnull().values)
# [[False False False  True  True]
#  [ True  True  True  True  True]
#  [False  True False  True  True]]

print(type(df.isnull().values))
# <class 'numpy.ndarray'>

Unlike pandas.DataFrame, sum() of numpy.ndarray calculates the sum of all elements across dimensions by default.

Therefore, by calling sum() on the values attribute (numpy.ndarray) of the result of isnull(), you can get the total number of NaN.

print(df.isnull().values.sum())
# 10

Count the total number of non-missing values

You can get the total number of non-missing elements by summing the number of each row and column obtained by count() with sum().

print(df.count().sum())
# 5

You can also call sum() on the values attribute (numpy.ndarray) of the result of notnull() or notna() (where the non-missing element is True).

print(df.notnull().values.sum())
# 5

Check if pandas.DataFrame contains at least one NaN

Using the total number of NaN shown above, you can check if DataFrame contains at least one NaN.

If the total number of NaN is not zero, it means DataFrame contains at least one NaN.

print(df.isnull().values.sum() != 0)
# True

If the total number of NaN equals the size attribute (the number of all elements), it means all elements are NaN.

print(df.size)
# 15

print(df.isnull().values.sum() == df.size)
# False

For pandas.Series

pandas.Series also has methods such as isnull(), isna(), notnull(), and notna(). It can be handled in the same way as the above examples of pandas.DataFrame.

s = df['state']
print(s)
# 0     NY
# 1    NaN
# 2     CA
# Name: state, dtype: object

print(s.isnull())
# 0    False
# 1     True
# 2    False
# Name: state, dtype: bool

print(s.notnull())
# 0     True
# 1    False
# 2     True
# Name: state, dtype: bool

print(s.isnull().any())
# True

print(s.isnull().all())
# False

print(s.isnull().sum())
# 1

print(s.count())
# 2

Related Categories

Related Articles