NumPy: Remove NaN (np.nan) from an array
In NumPy, to remove rows or columns containing NaN
(np.nan
) from an array (ndarray
), use np.isnan()
to identify NaN
and methods like any()
or all()
to extract rows or columns that do not contain NaN
.
Additionally, you can remove all NaN
values from an array, but this will flatten the array.
For basics on handling NaN
in Python, refer to the following article.
For replacing NaN
with other values instead of removing them, refer to the following article.
The NumPy version used in this article is as follows. Note that functionality may vary between versions. For example, consider reading the following CSV file, which contains missing data, using np.genfromtxt()
.
import numpy as np
print(np.__version__)
# 1.26.1
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
# [21. nan nan 24.]
# [31. 32. 33. 34.]]
Remove all NaN
from an array
You can use np.isnan()
to check if values in an ndarray
are NaN
.
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
# [21. nan nan 24.]
# [31. 32. 33. 34.]]
print(np.isnan(a))
# [[False False True False]
# [False True True False]
# [False False False False]]
Applying the negation operator (~
) to this resulting ndarray
turns NaN
to False
, which can be used as a mask to remove NaN
(extract non-NaN values). Since the number of remaining elements changes, the resulting ndarray
does not retain the same shape as the original ndarray
, but instead becomes flattened (converted to one-dimensional).
print(~np.isnan(a))
# [[ True True False True]
# [ True False False True]
# [ True True True True]]
print(a[~np.isnan(a)])
# [11. 12. 14. 21. 24. 31. 32. 33. 34.]
Remove rows containing NaN
To remove rows containing NaN
, call the any()
method on the ndarray
generated by np.isnan()
. The any()
method returns True
if there is at least one True
in the ndarray
.
By setting axis=1
in any()
, it checks whether there is at least one True
in each row, indicating the presence of NaN
.
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
# [21. nan nan 24.]
# [31. 32. 33. 34.]]
print(np.isnan(a))
# [[False False True False]
# [False True True False]
# [False False False False]]
print(np.isnan(a).any(axis=1))
# [ True True False]
Using the negation operator (~
) to swap True
and False
, rows without any NaN
become True
.
print(~np.isnan(a).any(axis=1))
# [False False True]
By applying this ndarray
to the rows (the first dimension) of the original ndarray
, you can remove rows with NaN
(extract rows without NaN
).
print(a[~np.isnan(a).any(axis=1), :])
# [[31. 32. 33. 34.]]
You can omit the column specification (:
) as shown below.
print(a[~np.isnan(a).any(axis=1)])
# [[31. 32. 33. 34.]]
To remove only rows where all elements are NaN
, use all()
instead of any()
.
Setting axis=1
checks if all elements in each row are True
. Here, np.nan
is assigned to elements for explanation.
a[1, 0] = np.nan
a[1, 3] = np.nan
print(a)
# [[11. 12. nan 14.]
# [nan nan nan nan]
# [31. 32. 33. 34.]]
print(np.isnan(a).all(axis=1))
# [False True False]
print(~np.isnan(a).all(axis=1))
# [ True False True]
print(a[~np.isnan(a).all(axis=1)])
# [[11. 12. nan 14.]
# [31. 32. 33. 34.]]
Remove columns containing NaN
The process to remove columns containing NaN
is similar to that used for rows.
Using any()
with axis=0
checks if there is at least one True
in each column, indicating the presence of NaN
. Apply the negation operator (~
) to convert columns without any NaN
to True
.
a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
# [21. nan nan 24.]
# [31. 32. 33. 34.]]
print(np.isnan(a))
# [[False False True False]
# [False True True False]
# [False False False False]]
print(np.isnan(a).any(axis=0))
# [False True True False]
print(~np.isnan(a).any(axis=0))
# [ True False False True]
By applying this ndarray
to the columns (the second dimension) of the original ndarray
, you can remove columns with NaN
(extract columns without NaN
).
print(a[:, ~np.isnan(a).any(axis=0)])
# [[11. 14.]
# [21. 24.]
# [31. 34.]]
To remove only columns where all elements are NaN
, use all()
instead of any()
.
a[2, 2] = np.nan
print(a)
# [[11. 12. nan 14.]
# [21. nan nan 24.]
# [31. 32. nan 34.]]
print(np.isnan(a).all(axis=0))
# [False False True False]
print(~np.isnan(a).all(axis=0))
# [ True True False True]
print(a[:, ~np.isnan(a).all(axis=0)])
# [[11. 12. 14.]
# [21. nan 24.]
# [31. 32. 34.]]