Search for a string in Python (Check if a substring is included/Get a substring position)

Modified: | Tags: Python, String, Regex

This article explains how to search a string to check if it contains a specific substring and to get its location in Python. The re module in the standard library allows more flexible operation with regular expressions.

See the following article on how to count specific characters or substrings in a string.

See the following articles on how to extract, replace, and compare strings.

If you want to search the contents of a text file, read the file as a string.

Check if a string contains a given substring: in

Use the in operator to check if a string contains a given substring.

The in operator is case-sensitive, and the same applies to the string methods described below. You can check for the presence of multiple substrings using and and or.

s = 'I am Sam'

print('Sam' in s)
# True

print('sam' in s)
# False

print('I' in s and 'Sam' in s)
# True

For more complex operations, consider using regular expressions, as described in the following sections.

Note that the in operator can also be used for lists, tuples, and dictionaries. See the following article for details.

Get the position (index) of a given substring: find(), rfind()

You can get the position of a given substring in the string with the find() method of str.

If the substring specified as the first argument is found, the method returns its starting position (the position of the first character); if not found, -1 is returned.

s = 'I am Sam'

print(s.find('Sam'))
# 5

print(s.find('XXX'))
# -1

In Python, the index of the first character in a string is 0.

I am Sam
01234567

If there are multiple occurrences of the substring, the position of the first occurrence (the leftmost substring) is returned.

To find all occurrences, you can adjust the range with the start and end arguments; however, using the regex approach described below is more convenient.

print(s.find('am'))
# 2

By specifying the second argument start and the third argument end, the search will be limited to the range of the slice [start:end].

print(s.find('am', 3))
# 6

print(s.find('am', 3, 5))
# -1

The rfind() method searches the string starting from the right side.

If multiple substrings are present, the position of the rightmost substring is returned. Similar to find(), you can also specify start and end arguments for the rfind() method.

print(s.rfind('am'))
# 6

print(s.rfind('XXX'))
# -1

print(s.rfind('am', 2))
# 6

print(s.rfind('am', 2, 5))
# 2

There are index() and rindex() methods similar to find() and rfind(). If the specified string does not exist, find() and rfind() return -1, but index() and rindex() raise an error.

print(s.index('am'))
# 2

# print(s.index('XXX'))
# ValueError: substring not found

print(s.rindex('am'))
# 6

# print(s.rindex('XXX'))
# ValueError: substring not found

Note that the in operator and the string methods mentioned so far are case-sensitive.

For case-insensitive searches, you can convert both the search string and target string to uppercase or lowercase. Use the upper() method to convert a string to uppercase, and the lower() method to convert it to lowercase.

s = 'I am Sam'

print(s.upper())
# I AM SAM

print(s.lower())
# i am sam

print('sam' in s)
# False

print('sam' in s.lower())
# True

print(s.find('sam'))
# -1

print(s.lower().find('sam'))
# 5

Check and get a position with regex: re.search()

Use regular expressions with the re module of the standard library.

Use re.search() to check if a string contains a given string with regex.

The first argument is a regex pattern, and the second is a target string. Although special characters and sequences can be used in the regex pattern, the following example demonstrates the simplest pattern by using the string as it is.

If the pattern matches, a match object is returned; otherwise, None is returned.

import re

s = 'I am Sam'

print(re.search('Sam', s))
# <re.Match object; span=(5, 8), match='Sam'>

print(re.search('XXX', s))
# None

You can get various information with the methods of the match object.

group() returns the matched string, start() returns the start position, end() returns the end position, and span() returns a tuple of (start position, end position).

m = re.search('Sam', s)

print(m.group())
# Sam

print(m.start())
# 5

print(m.end())
# 8

print(m.span())
# (5, 8)

Get all results with regex: re.findall(), re.finditer()

re.search() returns only the first match object, even if there are multiple matching occurrences in the string.

s = 'I am Sam'

print(re.search('am', s))
# <re.Match object; span=(2, 4), match='am'>

re.findall() returns all matching parts as a list of strings.

print(re.findall('am', s))
# ['am', 'am']

To get the positions of all matching parts, use re.finditer() along with list comprehensions.

print([m.span() for m in re.finditer('am', s)])
# [(2, 4), (6, 8)]

In the above example, span() is used so that a list of tuples, (start position, end position), is returned. If you want to get a list of only start or end positions, use start() or end().

Note that re.finditer() returns an iterator yielding match objects over all matches.

Search multiple strings with regex

Even if you do not have much experience with regular expressions, it is helpful to know the | symbol.

If the regex pattern is A|B, it matches A or B. You can use just a string for A and B (of course, you can use special characters and sequences), and you can use A|B|C for three or more.

You can search for multiple strings as follows.

s = 'I am Sam Adams'

print(re.findall('Sam|Adams', s))
# ['Sam', 'Adams']

print([m.span() for m in re.finditer('Sam|Adams', s)])
# [(5, 8), (9, 14)]

Use special characters and sequences

Using special characters and sequences in regex patterns allows for more complex searches.

s = 'I am Sam Adams'

print(re.findall('am', s))
# ['am', 'am', 'am']

print(re.findall('[a-zA-Z]+am[a-z]*', s))
# ['Sam', 'Adams']

See the following article for basic examples of utilizing regex patterns, such as wildcard-like patterns.

Case-insensitive search with regex: re.IGNORECASE

You can specify re.IGNORECASE as the flags argument of functions such as re.search() andre.findall() to search case-insensitive.

s = 'I am Sam'

print(re.search('sam', s))
# None

print(re.search('sam', s, flags=re.IGNORECASE))
# <re.Match object; span=(5, 8), match='Sam'>

Related Categories

Related Articles