note.nkmk.me

Compare strings in Python (exact match, partial match, etc.)

Posted: 2021-09-06 / Tags: Python, String, Regular expression

This article describes how to compare strings str in Python.

  • Exact match (equality comparison): ==, !=
  • Partial match: in, not in
  • Forward / Backward match: startswith(), endswith()
  • Order comparison: <, <=, >, >=
  • Case-insensitive comparison: upper(), lower()
  • Regular expressions: re.search(), re.fullmatch()
Sponsored Link

Exact match (equality comparison): ==, !=

As with numbers, the == operator is used to determine if two strings are equal. If they are equal, True is returned, and if they are not, False is returned.

print('abc' == 'abc')
# True

print('abc' == 'xyz')
# False

It is case-sensitive. The same applies to comparisons with other operators and methods. See below for a case-insensitive comparison.

print('abc' == 'ABC')
# False

! = returns True if they are not equal, and False if they are equal.

print('abc' != 'xyz')
# True

print('abc' != 'abc')
# False

Partial match: in, not in

Use the in operator for partial matches, i.e., whether one string contains the other string.

x in y returns True if x is contained in y (x is a substring of y), False if it is not. If each character of x is contained in y discretely, False is returned.

print('bbb' in 'aaa-bbb-ccc')
# True

print('xxx' in 'aaa-bbb-ccc')
# False

print('abc' in 'aaa-bbb-ccc')
# False

not in returns True if it is not included, False if it is included.

print('xxx' not in 'aaa-bbb-ccc')
# True

print('bbb' not in 'aaa-bbb-ccc')
# False

in and not in are also used to check the existence of elements in a list. See the following article for details.

Forward / Backward match: startswith(), endswith()

Use the string method startswith() for forward matching, i.e., whether a string starts with the specified string.

s = 'aaa-bbb-ccc'

print(s.startswith('aaa'))
# True

print(s.startswith('bbb'))
# False

You can also specify a tuple of strings as an argument.

True is returned if the string starts with one of the elements of the tuple, and False is returned if the string does not start with any of them. Note that an error will occur if you specify a list instead of a tuple.

print(s.startswith(('aaa', 'bbb', 'ccc')))
# True

print(s.startswith(('xxx', 'yyy', 'zzz')))
# False

# print(s.startswith(['a', 'b', 'c']))
# TypeError: startswith first arg must be str or a tuple of str, not list

Use the string method endswith() for backward matching, i.e., whether a string ends with the specified string. Its usage is the same as startswith().

print(s.endswith('ccc'))
# True

print(s.endswith('bbb'))
# False

print(s.endswith(('aaa', 'bbb', 'ccc')))
# True
Sponsored Link

Order comparison: <, <=, >, >=

Strings can be compared with the <, <=, >, and >= operators as well as numbers. They are compared in lexical order.

print('a' < 'b')
# True

print('aa' < 'ab')
# True

print('abc' < 'abcd')
# True

The order of the strings is compared in Unicode code points.

You can get the Unicode code point of a character with the built-in function ord().

print(ord('a'))
# 97

print(ord('b'))
# 98

Uppercase letters have smaller code points than lowercase letters.

print('Z' < 'a')
# True

print(ord('Z'))
# 90

When a list of strings is sorted with the list method sort() or the built-in function sorted(), the order is also determined based on Unicode code points.

print(sorted(['aaa', 'abc', 'Abc', 'ABC']))
# ['ABC', 'Abc', 'aaa', 'abc']

Case-insensitive comparison: upper(), lower()

All the operators and methods described so far are handled in a case-sensitive manner.

If you need a case-insensitive comparison, you can use upper() or lower() to convert both strings to uppercase or lowercase.

s1 = 'abc'
s2 = 'ABC'

print(s1 == s2)
# False

print(s1.lower() == s2.lower())
# True

Regular expressions: re.search(), re.fullmatch()

Regular expressions can be used for more flexible comparisons.

re.search()

Use re.search() for partial matching, forward matching, and backward matching. Note that re.match() can also be used for forward matching, but it is not discussed here.

Various meta characters (special characters) can be used for regular expression patterns, but it is also possible to simply specify a string as it is. A match object is returned if the string is included, and None if it is not. Match objects are always evaluated as True.

import re

s = 'aaa-AAA-123'

print(re.search('aaa', s))
# <re.Match object; span=(0, 3), match='aaa'>

print(re.search('xxx', s))
# None

The meta character ^ matches the start of the string, and $ matches the end of the string.

print(re.search('^aaa', s))
# <re.Match object; span=(0, 3), match='aaa'>

print(re.search('^123', s))
# None

print(re.search('aaa$', s))
# None

print(re.search('123$', s))
# <re.Match object; span=(8, 11), match='123'>

In addition, a variety of other meta characters and special sequences are available.

For example, [A-Z] represents any one letter of the uppercase alphabet, and + means that the previous pattern is repeated one or more times. Thus, [A-Z]+ matches any substring that consists of one or more consecutive uppercase alphabetic characters.

print(re.search('[A-Z]+', s))
# <re.Match object; span=(4, 7), match='AAA'>

re.fullmatch()

Use re.fullmatch() to check whether the whole string matches a regular expression pattern or not. Even if some parts match, None will be returned if some parts do not match.

s = '012-3456-7890'

print(re.fullmatch(r'\d{3}-\d{4}-\d{4}', s))
# <re.Match object; span=(0, 13), match='012-3456-7890'>

s = 'tel: 012-3456-7890'

print(re.fullmatch(r'\d{3}-\d{4}-\d{4}', s))
# None

\d represents a number and {n} represents n repetitions. Since backslash \ is used in special sequences of regular expressions, such as \d, it is useful to use raw strings (r'' or r"") that treat backslashes \ as literal characters.

re.fullmatch() was added in Python 3.4. In earlier versions, you can use re.search() with ^ and $ to do the same. You can also use re.match() and $, although it is not shown here.

s = '012-3456-7890'

print(re.search(r'^\d{3}-\d{4}-\d{4}$', s))
# <re.Match object; span=(0, 13), match='012-3456-7890'>

s = 'tel: 012-3456-7890'

print(re.search('^\d{3}-\d{4}-\d{4}$', s))
# None

re.IGNORECASE

By specifying re.IGNORECASE as the argument flags of functions such as re.search() and re.fullmatch(), case-insensitive comparison is possible.

s = 'ABC'

print(re.search('abc', s))
# None

print(re.search('abc', s, re.IGNORECASE))
# <re.Match object; span=(0, 3), match='ABC'>
Sponsored Link
Share

Related Categories

Related Articles