note.nkmk.me

NumPy: Remove rows / columns with missing value (NaN) in ndarray

Posted: 2020-12-15 / Tags: Python, NumPy

To remove rows and columns containing missing values NaN in NumPy array numpy.ndarray, check NaN with np.isnan() and extract rows and columns that do not contain NaN with any() or all() .

This article describes the following contents.

  • Remove all missing values (NaN)
  • Remove rows containing missing values (NaN)
  • Remove columns containing missing values (NaN)

See the following articles for how to delete rows / columns at any positions and rows / columns that meet conditions.

As an example, read the following CSV with missing data with np.genfromtxt().

import numpy as np

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]
Sponsored Link

Remove all missing values (NaN)

By np.isnan(), you can get ndarray whose missing values are True and the others are False.

print(np.isnan(a))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]

By using the negation operator ~ for this ndarray and setting NaN to False, the missing values can be deleted (= non-missing values are extracted). Since the number of remaining elements is different, the shape of the original array is not maintained and is flattened.

print(~np.isnan(a))
# [[ True  True False  True]
#  [ True False False  True]
#  [ True  True  True  True]]

print(a[~np.isnan(a)])
# [11. 12. 14. 21. 24. 31. 32. 33. 34.]

Remove rows containing missing values (NaN)

To remove rows containing missing values, use any() method that returns True if there is at least one True in ndarray.

With the argument axis=1, any() tests whether there is at least one True for each row.

print(np.isnan(a).any(axis=1))
# [ True  True False]

Use the negation operator ~ to make rows with no missing values True.

print(~np.isnan(a).any(axis=1))
# [False False  True]

By applying this boolean array to the first dimension (= row) of the original array, the rows containing the missing values are removed (= the rows containing the missing values are extracted).

print(a[~np.isnan(a).any(axis=1), :])
# [[31. 32. 33. 34.]]

You can omit the column specification : as shown below.

print(a[~np.isnan(a).any(axis=1)])
# [[31. 32. 33. 34.]]

If you want to remove only the rows where all the elements are NaN, use all() instead of any().

An example using all() is shown below.

Sponsored Link

Remove columns containing missing values (NaN)

The same applies when removing columns containing missing values.

With the argument axis=0, any() tests if there is at least one True for each column. Use the negation operator ~ to make columns with no missing values True.

print(~np.isnan(a).any(axis=0))
# [ True False False  True]

By applying this boolean array to the second dimension (= column) of the original array, the columns containing the missing values are removed (= the columns containing the missing values are extracted).

print(a[:, ~np.isnan(a).any(axis=0)])
# [[11. 14.]
#  [21. 24.]
#  [31. 34.]]

If you want to remove only the columns where all the elements are NaN, use all() instead of any().

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
a[2, 2] = np.nan
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. nan 34.]]

print(a[:, ~np.isnan(a).any(axis=0)])
# [[11. 14.]
#  [21. 24.]
#  [31. 34.]]

print(a[:, ~np.isnan(a).all(axis=0)])
# [[11. 12. 14.]
#  [21. nan 24.]
#  [31. 32. 34.]]
Sponsored Link
Share

Related Categories

Related Articles