note.nkmk.me

in operator in Python (for list, string, dictionary, etc.)

Posted: 2021-01-14 / Tags: Python, List

In Python, the operators in and not in test membership in lists, tuples, dictionaries, and so on.

This article describes the following contents.

  • How to use the in operator
    • Basic usage
    • With if statement
    • in for the dictionary (dict)
    • in for the string (str)
  • not in (negation of in)
  • in for multiple elements
    • Use and, or
    • Use sets
  • Time complexity of in
    • Slow for the list: O(n)
    • Fast for the set: O(1)
    • For the dictionary
  • in in for statements and list comprehensions

The word in is also used in for statements and list comprehensions. See the following articles for details.

Sponsored Link

How to use the in operator

Basic usage

x in y returns True if x is included in y, and False if it is not.

print(1 in [0, 1, 2])
# True

print(100 in [0, 1, 2])
# False
source: in_basic.py

Not only list, but also tuple, set, range, and other iterable objects can be operated.

print(1 in (0, 1, 2))
# True

print(1 in {0, 1, 2})
# True

print(1 in range(3))
# True
source: in_basic.py

The dictionary (dict) and the string (str) are described later.

With if statement

in returns a bool value (True, False) and can be used directly in if statement.

l = [0, 1, 2]
i = 0

if i in l:
    print('{} is a member of {}.'.format(i, l))
else:
    print('{} is not a member of {}.'.format(i, l))
# 0 is a member of [0, 1, 2].
source: in_basic.py
l = [0, 1, 2]
i = 100

if i in l:
    print('{} is a member of {}.'.format(i, l))
else:
    print('{} is not a member of {}.'.format(i, l))
# 100 is not a member of [0, 1, 2].
source: in_basic.py

Note that lists, tuples, strings, etc. are evaluated as False if they are empty, and as True if they are not. If you want to check whether an object is empty or not, you can use the object as it is.

l = [0, 1, 2]

if l:
    print('not empty')
else:
    print('empty')
# not empty
source: in_basic.py
l = []

if l:
    print('not empty')
else:
    print('empty')
# empty
source: in_basic.py

See also the following articles for truth value testing for each type.

"in" for the dictionary (dict)

The in operation for the dictionary (dict) tests on the key.

d = {'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}

print('key1' in d)
# True

print('value1' in d)
# False
source: in_basic.py

Use values(), items() if you want to test on values or key-value pairs.

print('value1' in d.values())
# True

print(('key1', 'value1') in d.items())
# True

print(('key1', 'value2') in d.items())
# False
source: in_basic.py

See the following article for details.

"in" for the string (str)

The in operation for the string (str) tests the existence of a substring.

print('a' in 'abc')
# True

print('x' in 'abc')
# False

print('ab' in 'abc')
# True

print('ac' in 'abc')
# False
source: in_basic.py

not in (negation of "in")

x not in y returns the negation of x in y

print(10 in [1, 2, 3])
# False

print(10 not in [1, 2, 3])
# True
source: in_basic.py

The same result is returned by adding not to the entire in operation.

print(not 10 in [1, 2, 3])
# True
source: in_basic.py

However, if you add not to the entire in operation, it will be interpreted in two ways, as shown below, so it is recommended to use the more explicit not in.

print(not (10 in [1, 2, 3]))
# True

print((not 10) in [1, 2, 3])
# False
source: in_basic.py

Since in has a higher precedence than not, it is treated as the former if there are no parentheses.

The latter case is recognized as follows.

print(not 10)
# False

print(False in [1, 2, 3])
# False
source: in_basic.py

"in" for multiple elements

If you want to check if multiple elements are included, using a list of those elements as follows will not work. It will be tested whether the list itself is included or not.

print([0, 1] in [0, 1, 2])
# False

print([0, 1] in [[0, 1], [1, 0]])
# True
source: in_basic.py

Use and, or or sets.

Use and, or

Combine multiple in operations using and and or. It will be tested whether both or either are included.

l = [0, 1, 2]
v1 = 0
v2 = 100

print(v1 in l and v2 in l)
# False

print(v1 in l or v2 in l)
# True

print((v1 in l) or (v2 in l))
# True
source: in_basic.py

Since in and not in have higher precedence than and and or, parentheses are not necessary. Of course, if it is difficult to read, you can enclose it in parentheses as in the last example.

Use sets

If you have a lot of elements you want to check, it is easier to use the set than and, or.

For example, whether list A contains all the elements of list B is equivalent to whether list B is a subset of list A.

l1 = [0, 1, 2, 3, 4]
l2 = [0, 1, 2]
l3 = [0, 1, 5]
l4 = [5, 6, 7]

print(set(l2) <= set(l1))
# True

print(set(l3) <= set(l1))
# False
source: in_basic.py

Whether list A does not contain the elements of list B is equivalent to whether list A and list B are relatively prime.

print(set(l1).isdisjoint(set(l4)))
# True
source: in_basic.py

If list A and list B are not relatively prime, it means that list A contains at least one element of list B.

print(not set(l1).isdisjoint(set(l3)))
# True
source: in_basic.py
Sponsored Link

Time complexity of "in"

The execution speed of the in operator depends on the type of the target object.

The measurement results of the execution time of in for lists, sets, and dictionaries are shown below.

Note that the code below uses the Jupyter Notebook magic command %%timeit and does not work when run as a Python script.

See the following article for time complexity.

Take a list of 10 elements and 10000 elements as an example.

n_small = 10
n_large = 10000

l_small = list(range(n_small))
l_large = list(range(n_large))
source: in_timeit.py

The sample code below is executed in CPython 3.7.4, and of course, the results may vary depending on the environment.

Slow for the list: O(n)

The average time complexity of the in operator for lists is O(n). It becomes slower when there are many elements.

%%timeit
-1 in l_small
# 178 ns ± 4.78 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

%%timeit
-1 in l_large
# 128 µs ± 11.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
source: in_timeit.py

The execution time varies greatly depending on the position of the value to look for. It takes the longest time when its value is at the end or when it does not exist.

%%timeit
0 in l_large
# 33.4 ns ± 0.397 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%%timeit
5000 in l_large
# 66.1 µs ± 4.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%%timeit
9999 in l_large
# 127 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
source: in_timeit.py

Fast for the set: O(1)

The average time complexity of the in operator for sets is O(1). It does not depend on the number of elements.

s_small = set(l_small)
s_large = set(l_large)

%%timeit
-1 in s_small
# 40.4 ns ± 0.572 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%%timeit
-1 in s_large
# 39.4 ns ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
source: in_timeit.py

The execution time does not change depending on the value to look for.

%%timeit
0 in s_large
# 39.7 ns ± 1.27 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%%timeit
5000 in s_large
# 53.1 ns ± 0.974 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%%timeit
9999 in s_large
# 52.4 ns ± 0.403 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
source: in_timeit.py

If you want to repeat in operation for a list with many elements, it is faster to convert it to a set in advance.

%%timeit
for i in range(n_large):
    i in l_large
# 643 ms ± 29.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%%timeit
s_large_ = set(l_large)
for i in range(n_large):
    i in s_large_
# 746 µs ± 6.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
source: in_timeit.py

Note that it takes time to convert a list to a set, so it may be faster to keep it as a list if the number of in operations is small.

For the dictionary

Take the following dictionary as an example.

d = dict(zip(l_large, l_large))
print(len(d))
# 10000

print(d[0])
# 0

print(d[9999])
# 9999
source: in_timeit.py

As mentioned above, the in operation for the dictionary tests on keys.

The key of the dictionary is a unique value as well as the set, and the execution time is about the same as for sets.

%%timeit
for i in range(n_large):
    i in d
# 756 µs ± 24.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
source: in_timeit.py

On the other hand, dictionary values are allowed to be duplicated like a list. The execution time of in for values() is about the same as for lists.

dv = d.values()

%%timeit
for i in range(n_large):
    i in dv
# 990 ms ± 28.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
source: in_timeit.py

Key-value pairs are unique. The execution time of in for items() is about set + α.

di = d.items()

%%timeit
for i in range(n_large):
    (i, i) in di
# 1.18 ms ± 26.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
source: in_timeit.py

"in" in for statements and list comprehensions

The word in is also used in for statements and list comprehensions.

l = [0, 1, 2]

for i in l:
    print(i)
# 0
# 1
# 2
source: in_basic.py
print([i * 10 for i in l])
# [0, 10, 20]
source: in_basic.py

See the following articles for details on for statements and list comprehensions.

Note that the in operator may be used as condition in list comprehensions, which is confusing.

l = ['oneXXXaaa', 'twoXXXbbb', 'three999aaa', '000111222']

l_in = [s for s in l if 'XXX' in s]
print(l_in)
# ['oneXXXaaa', 'twoXXXbbb']

The first in is in for the list comprehensions, and the second in is the in operator.

Sponsored Link
Share

Related Categories

Related Articles