pandas: Select rows by multiple conditions

Modified: | Tags: Python, pandas

This article describes how to select rows of pandas.DataFrame by multiple conditions.

When selecting based on multiple conditions, please keep the following two key points in mind:

  1. Use &, |, and ~ (not and, or, and not)
  2. Enclose each condition in parentheses () when using comparison operators

Note that this article describes the method using Boolean indexing. However, using the query() method can help you write more concisely.

The sample code in this article is based on pandas version 2.0.3. The following pandas.DataFrame is used as an example.

import pandas as pd

print(pd.__version__)
# 2.0.3

df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

While the example focuses on pandas.DataFrame, the same approach applies when filtering elements of pandas.Series with multiple conditions.

Select rows by a certain condition

For a DataFrame, specifying a list or Series of boolean values (True or False) in [] will extract the rows corresponding to True.

mask = [True, False, True, False, True, False]
print(df[mask])
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88

When applying comparison operators to a Series (e.g., a column of a DataFrame), you get a Boolean Series. Using this, rows that meet a certain condition can be extracted.

print(df['age'] < 25)
# 0     True
# 1    False
# 2     True
# 3    False
# 4     True
# 5    False
# Name: age, dtype: bool

print(df[df['age'] < 25])
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 4    Ellen   24    CA     88

The same applies when obtaining a Boolean Series using string methods.

print(df['name'].str.endswith('e'))
# 0     True
# 1    False
# 2     True
# 3     True
# 4    False
# 5    False
# Name: name, dtype: bool

print(df[df['name'].str.endswith('e')])
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70

Select rows by multiple conditions

The &, |, and ~ operators

To filter rows based on multiple conditions, apply the &, |, and ~ operators for AND, OR, and NOT respectively to multiple Boolean Series.

For AND operations between two Boolean Series, use &.

print((df['age'] < 25) & df['name'].str.endswith('e'))
# 0     True
# 1    False
# 2     True
# 3    False
# 4    False
# 5    False
# dtype: bool

print(df[(df['age'] < 25) & df['name'].str.endswith('e')])
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70

For OR operations, use |, and for NOT operations, use ~.

print(~df['name'].str.endswith('e'))
# 0    False
# 1     True
# 2    False
# 3    False
# 4     True
# 5     True
# Name: name, dtype: bool

print(df['point'] < 65)
# 0     True
# 1    False
# 2    False
# 3    False
# 4    False
# 5     True
# Name: point, dtype: bool

print(~df['name'].str.endswith('e') | (df['point'] < 65))
# 0     True
# 1     True
# 2    False
# 3    False
# 4     True
# 5     True
# dtype: bool

print(df[~df['name'].str.endswith('e') | (df['point'] < 65)])
#     name  age state  point
# 0  Alice   24    NY     64
# 1    Bob   42    CA     92
# 4  Ellen   24    CA     88
# 5  Frank   30    NY     57

As mentioned earlier, remember these two points:

  1. Use &, |, and ~ (not and, or, and not)
  2. Enclose each condition in parentheses () when using comparison operators

For example, using and, or, and not would result in the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

For details on why you need to use &, |, and ~ instead of and, or, and not, and the necessity of parentheses, refer to the article below:

The isin() method

The isin() method of Series returns a Series where elements that match any elements specified in the argument list are marked as True.

As in the previous examples, you can extract rows that match any elements specified using isin().

print(df['state'].isin(['NY', 'TX']))
# 0     True
# 1    False
# 2    False
# 3     True
# 4    False
# 5     True
# Name: state, dtype: bool

print(df[df['state'].isin(['NY', 'TX'])])
#     name  age state  point
# 0  Alice   24    NY     64
# 3   Dave   68    TX     70
# 5  Frank   30    NY     57

This is equivalent to the conditional extraction using == and |.

print(df[(df['state'] == 'NY') | (df['state'] == 'TX')])
#     name  age state  point
# 0  Alice   24    NY     64
# 3   Dave   68    TX     70
# 5  Frank   30    NY     57

The isin() method can only be used for exact matches. If you want to filter by partial matches, use the aforementioned string methods combined with & or |.

print(df[df['name'].str.contains('li') | df['name'].str.endswith('k')])
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 5    Frank   30    NY     57

Operator precedence

In Python, the operator precedence is as follows: ~ has the highest priority, followed by &, and then |.

When working with three or more conditions, the outcome can vary based on their order.

df_multi_1 = df[(df['age'] < 35) | ~(df['state'] == 'NY') & (df['point'] < 75)]
print(df_multi_1)
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

df_multi_2 = df[(df['age'] < 35) & (df['point'] < 75) | ~(df['state'] == 'NY')]
print(df_multi_2)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

For clarity and to avoid mistakes, it's recommended to enclose the conditions in parentheses.

df_multi_3 = df[((df['age'] < 35) | ~(df['state'] == 'NY')) & (df['point'] < 75)]
print(df_multi_3)
#       name  age state  point
# 0    Alice   24    NY     64
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 5    Frank   30    NY     57

Related Categories

Related Articles