Search for a string in Python (Check if a substring is included/Get a substring position)
This article explains how to search a string to check if it contains a specific substring and to get its location in Python. The re
module in the standard library allows more flexible operation with regular expressions.
- Check if a string contains a given substring:
in
- Get the position (index) of a given substring:
find()
,rfind()
- Case-insensitive search
- Check and get a position with regex:
re.search()
- Get all results with regex:
re.findall()
,re.finditer()
- Search multiple strings with regex
- Use special characters and sequences
- Case-insensitive search with regex:
re.IGNORECASE
See the following article on how to count specific characters or substrings in a string.
See the following articles on how to extract, replace, and compare strings.
- Extract a substring from a string in Python (position, regex)
- Replace strings in Python (replace, translate, re.sub, re.subn)
- String comparison in Python (exact/partial match, etc.)
If you want to search the contents of a text file, read the file as a string.
Check if a string contains a given substring: in
Use the in
operator to check if a string contains a given substring.
The in
operator is case-sensitive, and the same applies to the string methods described below. You can check for the presence of multiple substrings using and
and or
.
s = 'I am Sam'
print('Sam' in s)
# True
print('sam' in s)
# False
print('I' in s and 'Sam' in s)
# True
For more complex operations, consider using regular expressions, as described in the following sections.
Note that the in
operator can also be used for lists, tuples, and dictionaries. See the following article for details.
Get the position (index) of a given substring: find()
, rfind()
You can get the position of a given substring in the string with the find()
method of str
.
If the substring specified as the first argument is found, the method returns its starting position (the position of the first character); if not found, -1
is returned.
s = 'I am Sam'
print(s.find('Sam'))
# 5
print(s.find('XXX'))
# -1
In Python, the index of the first character in a string is 0
.
I am Sam
01234567
If there are multiple occurrences of the substring, the position of the first occurrence (the leftmost substring) is returned.
To find all occurrences, you can adjust the range with the start
and end
arguments; however, using the regex approach described below is more convenient.
print(s.find('am'))
# 2
By specifying the second argument start
and the third argument end
, the search will be limited to the range of the slice [start:end]
.
print(s.find('am', 3))
# 6
print(s.find('am', 3, 5))
# -1
The rfind()
method searches the string starting from the right side.
If multiple substrings are present, the position of the rightmost substring is returned. Similar to find()
, you can also specify start
and end
arguments for the rfind()
method.
print(s.rfind('am'))
# 6
print(s.rfind('XXX'))
# -1
print(s.rfind('am', 2))
# 6
print(s.rfind('am', 2, 5))
# 2
There are index()
and rindex()
methods similar to find()
and rfind()
. If the specified string does not exist, find()
and rfind()
return -1
, but index()
and rindex()
raise an error.
- Built-in Types - str.index() — Python 3.11.3 documentation
- Built-in Types - str.rindex() — Python 3.11.3 documentation
print(s.index('am'))
# 2
# print(s.index('XXX'))
# ValueError: substring not found
print(s.rindex('am'))
# 6
# print(s.rindex('XXX'))
# ValueError: substring not found
Case-insensitive search
Note that the in
operator and the string methods mentioned so far are case-sensitive.
For case-insensitive searches, you can convert both the search string and target string to uppercase or lowercase. Use the upper()
method to convert a string to uppercase, and the lower()
method to convert it to lowercase.
s = 'I am Sam'
print(s.upper())
# I AM SAM
print(s.lower())
# i am sam
print('sam' in s)
# False
print('sam' in s.lower())
# True
print(s.find('sam'))
# -1
print(s.lower().find('sam'))
# 5
Check and get a position with regex: re.search()
Use regular expressions with the re
module of the standard library.
Use re.search()
to check if a string contains a given string with regex.
The first argument is a regex pattern, and the second is a target string. Although special characters and sequences can be used in the regex pattern, the following example demonstrates the simplest pattern by using the string as it is.
If the pattern matches, a match object is returned; otherwise, None
is returned.
import re
s = 'I am Sam'
print(re.search('Sam', s))
# <re.Match object; span=(5, 8), match='Sam'>
print(re.search('XXX', s))
# None
You can get various information with the methods of the match object.
group()
returns the matched string, start()
returns the start position, end()
returns the end position, and span()
returns a tuple of (start position, end position)
.
m = re.search('Sam', s)
print(m.group())
# Sam
print(m.start())
# 5
print(m.end())
# 8
print(m.span())
# (5, 8)
Get all results with regex: re.findall()
, re.finditer()
re.search()
returns only the first match object, even if there are multiple matching occurrences in the string.
s = 'I am Sam'
print(re.search('am', s))
# <re.Match object; span=(2, 4), match='am'>
re.findall()
returns all matching parts as a list of strings.
print(re.findall('am', s))
# ['am', 'am']
To get the positions of all matching parts, use re.finditer()
along with list comprehensions.
print([m.span() for m in re.finditer('am', s)])
# [(2, 4), (6, 8)]
In the above example, span()
is used so that a list of tuples, (start position, end position)
, is returned. If you want to get a list of only start or end positions, use start()
or end()
.
Note that re.finditer()
returns an iterator yielding match objects over all matches.
Search multiple strings with regex
Even if you do not have much experience with regular expressions, it is helpful to know the |
symbol.
If the regex pattern is A|B
, it matches A
or B
. You can use just a string for A
and B
(of course, you can use special characters and sequences), and you can use A|B|C
for three or more.
You can search for multiple strings as follows.
s = 'I am Sam Adams'
print(re.findall('Sam|Adams', s))
# ['Sam', 'Adams']
print([m.span() for m in re.finditer('Sam|Adams', s)])
# [(5, 8), (9, 14)]
Use special characters and sequences
Using special characters and sequences in regex patterns allows for more complex searches.
s = 'I am Sam Adams'
print(re.findall('am', s))
# ['am', 'am', 'am']
print(re.findall('[a-zA-Z]+am[a-z]*', s))
# ['Sam', 'Adams']
See the following article for basic examples of utilizing regex patterns, such as wildcard-like patterns.
Case-insensitive search with regex: re.IGNORECASE
You can specify re.IGNORECASE
as the flags
argument of functions such as re.search()
andre.findall()
to search case-insensitive.
s = 'I am Sam'
print(re.search('sam', s))
# None
print(re.search('sam', s, flags=re.IGNORECASE))
# <re.Match object; span=(5, 8), match='Sam'>