String comparison in Python (exact/partial match, etc.)

Modified: | Tags: Python, String, Regex

This article explains string comparisons in Python, including topics such as exact match, partial match, forward/backward match, and more.

If you want to find the position of a substring in a string, see the following article.

Exact match (equality comparison): ==, !=

Similar to numbers, the == operator checks if two strings are equal. If they are equal, True is returned; otherwise, False is returned.

print('abc' == 'abc')
# True

print('abc' == 'xyz')
# False

This operation is case-sensitive, as are other comparison operators and methods. Case-insensitive comparisons are discussed later.

print('abc' == 'ABC')
# False

!= returns True if the two strings are not equal, and False if they are equal.

print('abc' != 'xyz')
# True

print('abc' != 'abc')
# False

Partial match: in, not in

To check for partial matches, use the in operator, which determines if one string contains another string.

x in y returns True if x is contained in y (i.e., x is a substring of y), and False if it is not. If the characters of x are found individually in y but not in sequence, False is returned.

print('bbb' in 'aaa-bbb-ccc')
# True

print('xxx' in 'aaa-bbb-ccc')
# False

print('abc' in 'aaa-bbb-ccc')
# False

not in returns True if the substring is not included, and False if it is included.

print('xxx' not in 'aaa-bbb-ccc')
# True

print('bbb' not in 'aaa-bbb-ccc')
# False

in and not in are also used to check the existence of elements in a list. See the following article for details.

Forward/backward match: startswith(), endswith()

For forward matching, use the string method startswith(), which checks if a string begins with the specified string.

s = 'aaa-bbb-ccc'

print(s.startswith('aaa'))
# True

print(s.startswith('bbb'))
# False

You can also specify a tuple of strings.

True is returned if the string starts with any element of the tuple; otherwise, False is returned. Note that specifying a list instead of a tuple will raise an error.

print(s.startswith(('aaa', 'bbb', 'ccc')))
# True

print(s.startswith(('xxx', 'yyy', 'zzz')))
# False

# print(s.startswith(['a', 'b', 'c']))
# TypeError: startswith first arg must be str or a tuple of str, not list

For backward matching, use the string method endswith(), which checks if a string ends with the specified string. Its usage is the same as startswith().

print(s.endswith('ccc'))
# True

print(s.endswith('bbb'))
# False

print(s.endswith(('aaa', 'bbb', 'ccc')))
# True

Order comparison: <, <=, >, >=

You can compare strings with the <, <=, >, and >= operators, just like numbers. Strings are compared in lexical order.

print('a' < 'b')
# True

print('aa' < 'ab')
# True

print('abc' < 'abcd')
# True

Characters are compared based on their Unicode code points.

You can get the Unicode code point of a character with the built-in ord() function.

print(ord('a'))
# 97

print(ord('b'))
# 98

Uppercase English letters have smaller code points than their corresponding lowercase letters in the Unicode standard.

print('Z' < 'a')
# True

print(ord('Z'))
# 90

When a list of strings is sorted with the list method sort() or the built-in sorted() function, the order is also determined based on Unicode code points.

print(sorted(['aaa', 'abc', 'Abc', 'ABC']))
# ['ABC', 'Abc', 'aaa', 'abc']

Case-insensitive comparison: upper(), lower()

All comparison operators and methods described so far are case-sensitive.

For case-insensitive comparisons, use upper() or lower() to convert both strings to uppercase or lowercase.

s1 = 'abc'
s2 = 'ABC'

print(s1 == s2)
# False

print(s1.lower() == s2.lower())
# True

Regex: re.search(), re.fullmatch()

Regular expressions allow for more flexible string comparisons.

re.search()

Use re.search() for partial, forward, and backward matching. Note that re.match() can also be used for forward matching, but it is not discussed here.

While various metacharacters (special characters) can be used in regular expression patterns, you can also directly specify a string without any metacharacters. If the string is included, a match object is returned; otherwise, None is returned. Match objects are always evaluated as True.

import re

s = 'aaa-AAA-123'

print(re.search('aaa', s))
# <re.Match object; span=(0, 3), match='aaa'>

print(re.search('xxx', s))
# None

The metacharacter ^ matches the start of the string, and $ matches the end of the string.

print(re.search('^aaa', s))
# <re.Match object; span=(0, 3), match='aaa'>

print(re.search('^123', s))
# None

print(re.search('aaa$', s))
# None

print(re.search('123$', s))
# <re.Match object; span=(8, 11), match='123'>

In addition, a variety of other metacharacters and special sequences are available.

For example, [A-Z] represents any single uppercase alphabet letter, and + means that the previous pattern is repeated one or more times. Thus, [A-Z]+ matches any substring that consists of one or more consecutive uppercase alphabetic characters.

print(re.search('[A-Z]+', s))
# <re.Match object; span=(4, 7), match='AAA'>

See the following article for basic examples of regular expression patterns, such as wildcard-like patterns.

re.fullmatch()

Use re.fullmatch() to check whether the entire string matches a regular expression pattern. Even if some parts match, None is returned if some parts do not match.

s = '012-3456-7890'

print(re.fullmatch(r'\d{3}-\d{4}-\d{4}', s))
# <re.Match object; span=(0, 13), match='012-3456-7890'>

s = 'tel: 012-3456-7890'

print(re.fullmatch(r'\d{3}-\d{4}-\d{4}', s))
# None

\d represents a number and {n} represents n repetitions. Since backslash \ is used in special sequences of regular expressions, such as \d, it is useful to use raw strings (r'' or r"") that treat backslashes as literal characters.

re.fullmatch() was added in Python 3.4. In earlier versions, you can use re.search() with ^ and $ to achieve the same result. You can also use re.match() and $, although it is not shown here.

s = '012-3456-7890'

print(re.search(r'^\d{3}-\d{4}-\d{4}$', s))
# <re.Match object; span=(0, 13), match='012-3456-7890'>

s = 'tel: 012-3456-7890'

print(re.search('^\d{3}-\d{4}-\d{4}$', s))
# None

re.IGNORECASE

To perform case-insensitive comparisons, specify re.IGNORECASE as the flags argument for functions like re.search() and re.fullmatch().

s = 'ABC'

print(re.search('abc', s))
# None

print(re.search('abc', s, re.IGNORECASE))
# <re.Match object; span=(0, 3), match='ABC'>

Related Categories

Related Articles