NumPy: Functions ignoring NaN (np.nansum, np.nanmean, etc.)

Modified: | Tags: Python, NumPy

In NumPy, functions like np.sum() and np.mean() return NaN if the array (ndarray) contains any NaN values. To perform calculations that ignore NaN, use functions such as np.nansum() and np.nanmean().

For basics on handling NaN in Python, refer to the following article.

To replace or remove NaN in ndarray, see the following articles.

The NumPy version used in this article is as follows. Note that functionality may vary between versions. For example, consider reading the following CSV file, which contains missing data, using np.genfromtxt().

import numpy as np

print(np.__version__)
# 1.26.1

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

Calculate the sum ignoring NaN: np.nansum()

If the ndarray contains NaN, calculating the sum using the np.sum() function or the sum() method of ndarray returns NaN.

print(np.sum(a))
# nan

print(a.sum())
# nan

To calculate the sum ignoring NaN, use the np.nansum() function.

print(np.nansum(a))
# 212.0

Similar to np.sum(), setting the axis argument allows calculation of sums by row or column. The keepdims argument can also be specified.

print(np.nansum(a, axis=0))
# [63. 44. 33. 72.]

print(np.nansum(a, axis=1))
# [ 37.  45. 130.]

There is no nansum() method for ndarray.

Functions ignoring NaN: np.nanmean(), np.nanmax(), np.nanmin(), etc,

For functions like np.mean(), np.max(), and np.min(), there are alternatives that ignore NaN. These include np.nanmean(), np.nanmax(), and np.nanmin(), among others.

print(np.nanmean(a))
# 23.555555555555557

print(np.nanmax(a))
# 34.0

print(np.nanmin(a))
# 11.0

print(np.nanstd(a))
# 8.908312112367753

print(np.nanvar(a))
# 79.35802469135803

All these functions allow specifying arguments such as axis or keepdims.

Related Categories

Related Articles