NumPy: Remove NaN (np.nan) from an array

Modified: | Tags: Python, NumPy

In NumPy, to remove rows or columns containing NaN (np.nan) from an array (ndarray), use np.isnan() to identify NaN and methods like any() or all() to extract rows or columns that do not contain NaN.

Additionally, you can remove all NaN values from an array, but this will flatten the array.

For basics on handling NaN in Python, refer to the following article.

For replacing NaN with other values instead of removing them, refer to the following article.

The NumPy version used in this article is as follows. Note that functionality may vary between versions. For example, consider reading the following CSV file, which contains missing data, using np.genfromtxt().

import numpy as np

print(np.__version__)
# 1.26.1

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

Remove all NaN from an array

You can use np.isnan() to check if values in an ndarray are NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.isnan(a))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]

Applying the negation operator (~) to this resulting ndarray turns NaN to False, which can be used as a mask to remove NaN (extract non-NaN values). Since the number of remaining elements changes, the resulting ndarray does not retain the same shape as the original ndarray, but instead becomes flattened (converted to one-dimensional).

print(~np.isnan(a))
# [[ True  True False  True]
#  [ True False False  True]
#  [ True  True  True  True]]

print(a[~np.isnan(a)])
# [11. 12. 14. 21. 24. 31. 32. 33. 34.]

Remove rows containing NaN

To remove rows containing NaN, call the any() method on the ndarray generated by np.isnan(). The any() method returns True if there is at least one True in the ndarray.

By setting axis=1 in any(), it checks whether there is at least one True in each row, indicating the presence of NaN.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.isnan(a))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]

print(np.isnan(a).any(axis=1))
# [ True  True False]

Using the negation operator (~) to swap True and False, rows without any NaN become True.

print(~np.isnan(a).any(axis=1))
# [False False  True]

By applying this ndarray to the rows (the first dimension) of the original ndarray, you can remove rows with NaN (extract rows without NaN).

print(a[~np.isnan(a).any(axis=1), :])
# [[31. 32. 33. 34.]]

You can omit the column specification (:) as shown below.

print(a[~np.isnan(a).any(axis=1)])
# [[31. 32. 33. 34.]]

To remove only rows where all elements are NaN, use all() instead of any().

Setting axis=1 checks if all elements in each row are True. Here, np.nan is assigned to elements for explanation.

a[1, 0] = np.nan
a[1, 3] = np.nan
print(a)
# [[11. 12. nan 14.]
#  [nan nan nan nan]
#  [31. 32. 33. 34.]]

print(np.isnan(a).all(axis=1))
# [False  True False]

print(~np.isnan(a).all(axis=1))
# [ True False  True]

print(a[~np.isnan(a).all(axis=1)])
# [[11. 12. nan 14.]
#  [31. 32. 33. 34.]]

Remove columns containing NaN

The process to remove columns containing NaN is similar to that used for rows.

Using any() with axis=0 checks if there is at least one True in each column, indicating the presence of NaN. Apply the negation operator (~) to convert columns without any NaN to True.

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

print(np.isnan(a))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]

print(np.isnan(a).any(axis=0))
# [False  True  True False]

print(~np.isnan(a).any(axis=0))
# [ True False False  True]

By applying this ndarray to the columns (the second dimension) of the original ndarray, you can remove columns with NaN (extract columns without NaN).

print(a[:, ~np.isnan(a).any(axis=0)])
# [[11. 14.]
#  [21. 24.]
#  [31. 34.]]

To remove only columns where all elements are NaN, use all() instead of any().

a[2, 2] = np.nan
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. nan 34.]]

print(np.isnan(a).all(axis=0))
# [False False  True False]

print(~np.isnan(a).all(axis=0))
# [ True  True False  True]

print(a[:, ~np.isnan(a).all(axis=0)])
# [[11. 12. 14.]
#  [21. nan 24.]
#  [31. 32. 34.]]

Related Categories

Related Articles