note.nkmk.me

NumPy: Count the number of elements satisfying the condition

Posted: 2019-05-29 / Modified: 2019-11-05 / Tags: Python, NumPy

A method of counting the number of elements satisfying the conditions of the NumPy array ndarray will be described together with sample code.

  • For the entire ndarray
  • For each row and column of ndarray
  • Check if there is at least one element satisfying the condition: numpy.any()
  • Check if all elements satisfy the conditions: numpy.all()
  • Multiple conditions
  • Count missing values NaN and infinity inf

If you want to extract or delete elements, rows and columns that satisfy the conditions, see the following post.

If you want to replace an element that satisfies the conditions, see the following post.

See the following post for the total number of elements.

Sponsored Link

Count the number of elements satisfying the condition for the entire ndarray

The comparison operation of ndarray returns ndarray with bool (True,False).

import numpy as np

a = np.arange(12).reshape((3, 4))
print(a)
# [[ 0  1  2  3]
#  [ 4  5  6  7]
#  [ 8  9 10 11]]

print(a < 4)
# [[ True  True  True  True]
#  [False False False False]
#  [False False False False]]

print(a % 2 == 1)
# [[False  True False  True]
#  [False  True False  True]
#  [False  True False  True]]

Using np.count_nonzero() gives the number of True, ie, the number of elements that satisfy the condition.

print(np.count_nonzero(a < 4))
# 4

print(np.count_nonzero(a % 2 == 1))
# 6

Since True is treated as 1 and False is treated as 0, you can use np.sum(). However, np.count_nonzero() is faster than np.sum().

print(np.sum(a < 4))
# 4

print(np.sum(a % 2 == 1))
# 6

Count the number of elements satisfying the condition for each row and column of ndarray

np.count_nonzero() for multi-dimensional array counts for each axis (each dimension) by specifying parameter axis.

In the case of a two-dimensional array, axis=0 gives the count per column, axis=1 gives the count per row.

By using this, you can count the number of elements satisfying the conditions for each row and column.

print(np.count_nonzero(a < 4, axis=0))
# [1 1 1 1]

print(np.count_nonzero(a < 4, axis=1))
# [4 0 0]

print(np.count_nonzero(a % 2 == 1, axis=0))
# [0 3 0 3]

print(np.count_nonzero(a % 2 == 1, axis=1))
# [2 2 2]

Note that the parameter axis of np.count_nonzero() is new in 1.12.0. In older versions you can use np.sum(). In np.sum(), you can specify axis from version 1.7.0

Check if there is at least one element satisfying the condition: numpy.any()

np.any() is a function that returns True when ndarray passed to the first parameter conttains at least one True element, and returns False otherwise.

print(np.any(a < 4))
# True

print(np.any(a > 100))
# False

As with np.count_nonzero(), np.any() is processed for each row or column when parameter axis is specified.

print(np.any(a < 4, axis=0))
# [ True  True  True  True]

print(np.any(a < 4, axis=1))
# [ True False False]
Sponsored Link

Check if all elements satisfy the conditions: numpy.all()

np.all() is a function that returns True when all elements of ndarray passed to the first parameter are True, and returns False otherwise.

print(np.all(a < 4))
# False

print(np.all(a < 100))
# True

As with np.count_nonzero(), np.all() is processed for each row or column when parameter axis is specified.

print(np.all(a < 4, axis=0))
# [False False False False]

print(np.all(a < 4, axis=1))
# [ True False False]

Multiple conditions

If you want to combine multiple conditions, enclose each conditional expression with () and use & or |.

print((a < 4) | (a % 2 == 1))
# [[ True  True  True  True]
#  [False  True False  True]
#  [False  True False  True]]

print(np.count_nonzero((a < 4) | (a % 2 == 1)))
# 8

print(np.count_nonzero((a < 4) | (a % 2 == 1), axis=0))
# [1 3 1 3]

print(np.count_nonzero((a < 4) | (a % 2 == 1), axis=1))
# [4 2 2]

Count missing values NaN and infinity inf

To count the number of missing values NaN, you need to use the special function.

Use CSV file with missing data as an example for missing values NaN.

a_nan = np.genfromtxt('data/src/sample_nan.csv', delimiter=',')
print(a_nan)
# [[11. 12. nan 14.]
#  [21. nan nan 24.]
#  [31. 32. 33. 34.]]

Missing value NaN can be generated by np.nan, float('nan'), etc. However, even if missing values are compared with ==, it becomes False. To count, you need to use np.isnan().

print(np.nan == np.nan)
# False

print(a_nan == np.nan)
# [[False False False False]
#  [False False False False]
#  [False False False False]]

print(np.isnan(a_nan))
# [[False False  True False]
#  [False  True  True False]
#  [False False False False]]

After that, just like the previous examples, you can count the number of True with np.count_nonzero() or np.sum().

print(np.count_nonzero(np.isnan(a_nan)))
# 3

print(np.count_nonzero(np.isnan(a_nan), axis=0))
# [0 1 2 0]

print(np.count_nonzero(np.isnan(a_nan), axis=1))
# [1 2 0]

If you want to count elements that are not missing values, use negation ~.

print(~np.isnan(a_nan))
# [[ True  True False  True]
#  [ True False False  True]
#  [ True  True  True  True]]

The function that determines whether an element is infinite inf (such asnp.inf) is np.isinf (). Both positive and negative infinity are True.

a_inf = np.array([-np.inf, 0, np.inf])
print(a_inf)
# [-inf   0.  inf]

print(np.isinf(a_inf))
# [ True False  True]

inf can be compared with ==. If you want to judge only positive or negative, you can use ==.

print(a_inf == np.inf)
# [False False  True]

print(a_inf == -np.inf)
# [ True False False]

After that, just like the previous examples, you can count the number of True with np.count_nonzero() or np.sum().

Sponsored Link
Share

Related Categories

Related Posts