pandas: Select rows by multiple conditions
This article describes how to select rows of pandas.DataFrame
by multiple conditions.
When selecting based on multiple conditions, please keep the following two key points in mind:
- Use
&
,|
, and~
(notand
,or
, andnot
) - Enclose each condition in parentheses
()
when using comparison operators
Note that this article describes the method using Boolean indexing. However, using the query()
method can help you write more concisely.
The sample code in this article is based on pandas version 2.0.3
. The following pandas.DataFrame
is used as an example.
import pandas as pd
print(pd.__version__)
# 2.0.3
df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
While the example focuses on pandas.DataFrame
, the same approach applies when filtering elements of pandas.Series
with multiple conditions.
Select rows by a certain condition
For a DataFrame
, specifying a list or Series
of boolean values (True
or False
) in []
will extract the rows corresponding to True
.
mask = [True, False, True, False, True, False]
print(df[mask])
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
When applying comparison operators to a Series
(e.g., a column of a DataFrame
), you get a Boolean Series
. Using this, rows that meet a certain condition can be extracted.
print(df['age'] < 25)
# 0 True
# 1 False
# 2 True
# 3 False
# 4 True
# 5 False
# Name: age, dtype: bool
print(df[df['age'] < 25])
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 4 Ellen 24 CA 88
The same applies when obtaining a Boolean Series
using string methods.
print(df['name'].str.endswith('e'))
# 0 True
# 1 False
# 2 True
# 3 True
# 4 False
# 5 False
# Name: name, dtype: bool
print(df[df['name'].str.endswith('e')])
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
Select rows by multiple conditions
The &
, |
, and ~
operators
To filter rows based on multiple conditions, apply the &
, |
, and ~
operators for AND, OR, and NOT respectively to multiple Boolean Series
.
For AND operations between two Boolean Series
, use &
.
print((df['age'] < 25) & df['name'].str.endswith('e'))
# 0 True
# 1 False
# 2 True
# 3 False
# 4 False
# 5 False
# dtype: bool
print(df[(df['age'] < 25) & df['name'].str.endswith('e')])
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
For OR operations, use |
, and for NOT operations, use ~
.
print(~df['name'].str.endswith('e'))
# 0 False
# 1 True
# 2 False
# 3 False
# 4 True
# 5 True
# Name: name, dtype: bool
print(df['point'] < 65)
# 0 True
# 1 False
# 2 False
# 3 False
# 4 False
# 5 True
# Name: point, dtype: bool
print(~df['name'].str.endswith('e') | (df['point'] < 65))
# 0 True
# 1 True
# 2 False
# 3 False
# 4 True
# 5 True
# dtype: bool
print(df[~df['name'].str.endswith('e') | (df['point'] < 65)])
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
As mentioned earlier, remember these two points:
- Use
&
,|
, and~
(notand
,or
, andnot
) - Enclose each condition in parentheses
()
when using comparison operators
For example, using and
, or
, and not
would result in the following error:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
For details on why you need to use &
, |
, and ~
instead of and
, or
, and not
, and the necessity of parentheses, refer to the article below:
The isin()
method
The isin()
method of Series
returns a Series
where elements that match any elements specified in the argument list are marked as True
.
As in the previous examples, you can extract rows that match any elements specified using isin()
.
print(df['state'].isin(['NY', 'TX']))
# 0 True
# 1 False
# 2 False
# 3 True
# 4 False
# 5 True
# Name: state, dtype: bool
print(df[df['state'].isin(['NY', 'TX'])])
# name age state point
# 0 Alice 24 NY 64
# 3 Dave 68 TX 70
# 5 Frank 30 NY 57
This is equivalent to the conditional extraction using ==
and |
.
print(df[(df['state'] == 'NY') | (df['state'] == 'TX')])
# name age state point
# 0 Alice 24 NY 64
# 3 Dave 68 TX 70
# 5 Frank 30 NY 57
The isin()
method can only be used for exact matches. If you want to filter by partial matches, use the aforementioned string methods combined with &
or |
.
print(df[df['name'].str.contains('li') | df['name'].str.endswith('k')])
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 5 Frank 30 NY 57
Operator precedence
In Python, the operator precedence is as follows: ~
has the highest priority, followed by &
, and then |
.
When working with three or more conditions, the outcome can vary based on their order.
df_multi_1 = df[(df['age'] < 35) | ~(df['state'] == 'NY') & (df['point'] < 75)]
print(df_multi_1)
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
df_multi_2 = df[(df['age'] < 35) & (df['point'] < 75) | ~(df['state'] == 'NY')]
print(df_multi_2)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
For clarity and to avoid mistakes, it's recommended to enclose the conditions in parentheses.
df_multi_3 = df[((df['age'] < 35) | ~(df['state'] == 'NY')) & (df['point'] < 75)]
print(df_multi_3)
# name age state point
# 0 Alice 24 NY 64
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 5 Frank 30 NY 57