String comparison in Python (exact/partial match, etc.)
This article explains string comparisons in Python, including topics such as exact match, partial match, forward/backward match, and more.
If you want to find the position of a substring in a string, see the following article.
Exact match (equality comparison): ==
, !=
Similar to numbers, the ==
operator checks if two strings are equal. If they are equal, True
is returned; otherwise, False
is returned.
print('abc' == 'abc')
# True
print('abc' == 'xyz')
# False
This operation is case-sensitive, as are other comparison operators and methods. Case-insensitive comparisons are discussed later.
print('abc' == 'ABC')
# False
!=
returns True
if the two strings are not equal, and False
if they are equal.
print('abc' != 'xyz')
# True
print('abc' != 'abc')
# False
Partial match: in
, not in
To check for partial matches, use the in
operator, which determines if one string contains another string.
x in y
returns True
if x
is contained in y
(i.e., x
is a substring of y
), and False
if it is not. If the characters of x
are found individually in y
but not in sequence, False
is returned.
print('bbb' in 'aaa-bbb-ccc')
# True
print('xxx' in 'aaa-bbb-ccc')
# False
print('abc' in 'aaa-bbb-ccc')
# False
not in
returns True
if the substring is not included, and False
if it is included.
print('xxx' not in 'aaa-bbb-ccc')
# True
print('bbb' not in 'aaa-bbb-ccc')
# False
in
and not in
are also used to check the existence of elements in a list. See the following article for details.
Forward/backward match: startswith()
, endswith()
For forward matching, use the string method startswith()
, which checks if a string begins with the specified string.
s = 'aaa-bbb-ccc'
print(s.startswith('aaa'))
# True
print(s.startswith('bbb'))
# False
You can also specify a tuple of strings.
True
is returned if the string starts with any element of the tuple; otherwise, False
is returned. Note that specifying a list instead of a tuple will raise an error.
print(s.startswith(('aaa', 'bbb', 'ccc')))
# True
print(s.startswith(('xxx', 'yyy', 'zzz')))
# False
# print(s.startswith(['a', 'b', 'c']))
# TypeError: startswith first arg must be str or a tuple of str, not list
For backward matching, use the string method endswith()
, which checks if a string ends with the specified string. Its usage is the same as startswith()
.
print(s.endswith('ccc'))
# True
print(s.endswith('bbb'))
# False
print(s.endswith(('aaa', 'bbb', 'ccc')))
# True
Order comparison: <
, <=
, >
, >=
You can compare strings with the <
, <=
, >
, and >=
operators, just like numbers. Strings are compared in lexical order.
print('a' < 'b')
# True
print('aa' < 'ab')
# True
print('abc' < 'abcd')
# True
Characters are compared based on their Unicode code points.
You can get the Unicode code point of a character with the built-in ord()
function.
print(ord('a'))
# 97
print(ord('b'))
# 98
Uppercase English letters have smaller code points than their corresponding lowercase letters in the Unicode standard.
print('Z' < 'a')
# True
print(ord('Z'))
# 90
When a list of strings is sorted with the list method sort()
or the built-in sorted()
function, the order is also determined based on Unicode code points.
print(sorted(['aaa', 'abc', 'Abc', 'ABC']))
# ['ABC', 'Abc', 'aaa', 'abc']
Case-insensitive comparison: upper()
, lower()
All comparison operators and methods described so far are case-sensitive.
For case-insensitive comparisons, use upper()
or lower()
to convert both strings to uppercase or lowercase.
s1 = 'abc'
s2 = 'ABC'
print(s1 == s2)
# False
print(s1.lower() == s2.lower())
# True
Regex: re.search()
, re.fullmatch()
Regular expressions allow for more flexible string comparisons.
re.search()
Use re.search()
for partial, forward, and backward matching. Note that re.match()
can also be used for forward matching, but it is not discussed here.
While various metacharacters (special characters) can be used in regular expression patterns, you can also directly specify a string without any metacharacters. If the string is included, a match object is returned; otherwise, None
is returned. Match objects are always evaluated as True
.
import re
s = 'aaa-AAA-123'
print(re.search('aaa', s))
# <re.Match object; span=(0, 3), match='aaa'>
print(re.search('xxx', s))
# None
The metacharacter ^
matches the start of the string, and $
matches the end of the string.
print(re.search('^aaa', s))
# <re.Match object; span=(0, 3), match='aaa'>
print(re.search('^123', s))
# None
print(re.search('aaa$', s))
# None
print(re.search('123$', s))
# <re.Match object; span=(8, 11), match='123'>
In addition, a variety of other metacharacters and special sequences are available.
For example, [A-Z]
represents any single uppercase alphabet letter, and +
means that the previous pattern is repeated one or more times. Thus, [A-Z]+
matches any substring that consists of one or more consecutive uppercase alphabetic characters.
print(re.search('[A-Z]+', s))
# <re.Match object; span=(4, 7), match='AAA'>
See the following article for basic examples of regular expression patterns, such as wildcard-like patterns.
re.fullmatch()
Use re.fullmatch()
to check whether the entire string matches a regular expression pattern. Even if some parts match, None
is returned if some parts do not match.
s = '012-3456-7890'
print(re.fullmatch(r'\d{3}-\d{4}-\d{4}', s))
# <re.Match object; span=(0, 13), match='012-3456-7890'>
s = 'tel: 012-3456-7890'
print(re.fullmatch(r'\d{3}-\d{4}-\d{4}', s))
# None
\d
represents a number and {n}
represents n
repetitions. Since backslash \
is used in special sequences of regular expressions, such as \d
, it is useful to use raw strings (r''
or r""
) that treat backslashes as literal characters.
re.fullmatch()
was added in Python 3.4. In earlier versions, you can use re.search()
with ^
and $
to achieve the same result. You can also use re.match()
and $
, although it is not shown here.
s = '012-3456-7890'
print(re.search(r'^\d{3}-\d{4}-\d{4}$', s))
# <re.Match object; span=(0, 13), match='012-3456-7890'>
s = 'tel: 012-3456-7890'
print(re.search('^\d{3}-\d{4}-\d{4}$', s))
# None
re.IGNORECASE
To perform case-insensitive comparisons, specify re.IGNORECASE
as the flags
argument for functions like re.search()
and re.fullmatch()
.
s = 'ABC'
print(re.search('abc', s))
# None
print(re.search('abc', s, re.IGNORECASE))
# <re.Match object; span=(0, 3), match='ABC'>