# NumPy: Remove rows / columns with missing value (NaN) in ndarray

Posted: 2020-12-15 / Tags: Python, NumPy

To remove rows and columns containing missing values `NaN` in NumPy array `numpy.ndarray`, check `NaN` with `np.isnan()` and extract rows and columns that do not contain `NaN` with `any()` or `all()` .

• Remove all missing values (`NaN`)
• Remove rows containing missing values (`NaN`)
• Remove columns containing missing values (`NaN`)

See the following articles for how to delete rows / columns at any positions and rows / columns that meet conditions.

As an example, read the following CSV with missing data with `np.genfromtxt()`.

``````import numpy as np

a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]
``````

## Remove all missing values (NaN)

By `np.isnan()`, you can get `ndarray` whose missing values are `True` and the others are `False`.

``````print(np.isnan(a))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]
``````

By using the negation operator `~` for this `ndarray` and setting `NaN` to `False`, the missing values can be deleted (= non-missing values are extracted). Since the number of remaining elements is different, the shape of the original array is not maintained and is flattened.

``````print(~np.isnan(a))
# [[ True  True False  True]
#  [ True False False  True]
#  [ True  True  True  True]]

print(a[~np.isnan(a)])
# [11. 12. 14. 21. 24. 31. 32. 33. 34.]
``````

## Remove rows containing missing values (NaN)

To remove rows containing missing values, use `any()` method that returns `True` if there is at least one `True` in `ndarray`.

With the argument `axis=1`, `any()` tests whether there is at least one `True` for each row.

``````print(np.isnan(a).any(axis=1))
# [ True  True False]
``````

Use the negation operator `~` to make rows with no missing values `True`.

``````print(~np.isnan(a).any(axis=1))
# [False False  True]
``````

By applying this boolean array to the first dimension (= row) of the original array, the rows containing the missing values are removed (= the rows containing the missing values are extracted).

``````print(a[~np.isnan(a).any(axis=1), :])
# [[31. 32. 33. 34.]]
``````

You can omit the column specification `:` as shown below.

``````print(a[~np.isnan(a).any(axis=1)])
# [[31. 32. 33. 34.]]
``````

If you want to remove only the rows where all the elements are `NaN`, use `all()` instead of `any()`.

An example using `all()` is shown below.

## Remove columns containing missing values (NaN)

The same applies when removing columns containing missing values.

With the argument `axis=0`, `any()` tests if there is at least one `True` for each column. Use the negation operator `~` to make columns with no missing values `True`.

``````print(~np.isnan(a).any(axis=0))
# [ True False False  True]
``````

By applying this boolean array to the second dimension (= column) of the original array, the columns containing the missing values are removed (= the columns containing the missing values are extracted).

``````print(a[:, ~np.isnan(a).any(axis=0)])
# [[11. 14.]
#  [21. 24.]
#  [31. 34.]]
``````

If you want to remove only the columns where all the elements are `NaN`, use `all()` instead of `any()`.

``````a = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
a[2, 2] = np.nan
print(a)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. nan 34.]]

print(a[:, ~np.isnan(a).any(axis=0)])
# [[11. 14.]
#  [21. 24.]
#  [31. 34.]]

print(a[:, ~np.isnan(a).all(axis=0)])
# [[11. 12. 14.]
#  [21. nan 24.]
#  [31. 32. 34.]]
``````