How to Use Regex Match Objects in Python
In Python's re module, match()
and search()
return match objects when a string matches a regular expression pattern. You can extract the matched string and its position using methods provided by the match object.
The sample code in this article uses the following string as an example.
import re
s = 'aaa@xxx.com'
For more information on how to use the functions and other features of the re
module, see the following article:
Get the matched position: start()
, end()
, span()
When a string matches a regex pattern using match()
or search()
, a match object is returned.
m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(type(m))
# <class 're.Match'>
You can get the position (index) of the matched substring using the match object's methods start()
, end()
, and span()
.
print(m.start())
# 0
print(m.end())
# 11
print(m.span())
# (0, 11)
start()
returns the beginning of the matched substring, end()
returns the end, and span()
returns a tuple containing the beginning and end.
Extract the matched string: group()
You can extract the matched part as a string using the match object's group()
method.
m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(m.group())
# aaa@xxx.com
print(type(m.group()))
# <class 'str'>
Grouping in regex patterns
Parentheses ()
are used to group part of a regex pattern string.
Extract each group's string: groups()
You can extract a tuple containing the strings that matched each group using the match object's groups()
method.
m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(m.groups())
# ('aaa', 'xxx', 'com')
Get the string and position of any group
When using grouping, the group()
method allows you to access the string of any group by specifying a number as an argument. If the argument is omitted or set to 0
, it returns the entire match. Specifying 1
or higher returns the strings of each group in order, and a value larger than the number of groups leads to an error.
m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(m.group())
# aaa@xxx.com
print(m.group(0))
# aaa@xxx.com
print(m.group(1))
# aaa
print(m.group(2))
# xxx
print(m.group(3))
# com
# print(m.group(4))
# IndexError: no such group
Supplying multiple numbers as arguments to group()
returns a tuple with the corresponding strings. This way, you can select only the desired groups.
print(m.group(0, 1, 3))
# ('aaa@xxx.com', 'aaa', 'com')
start()
, end()
, and span()
work similarly to group()
, but do not accept multiple values.
print(m.span())
# (0, 11)
print(m.span(3))
# (8, 11)
# print(m.span(4))
# IndexError: no such group
# print(m.span(0, 1))
# TypeError: span expected at most 1 arguments, got 2
Nested groups
Grouping parentheses ()
can be nested. To retrieve the entire match string with groups()
, enclose the entire pattern with ()
. The group order is determined by order of the (
.
m = re.match(r'(([a-z]+)@([a-z]+)\.([a-z]+))', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(m.groups())
# ('aaa@xxx.com', 'aaa', 'xxx', 'com')
Set names for groups
By adding ?P<xxx>
at the start of ()
, you can assign a custom name to the group. Then you can specify the name instead of a number as an argument to group()
, start()
, end()
, or span()
to access the corresponding part of the string or its position.
m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(m.group('local'))
# aaa
print(m.group('SLD'))
# xxx
print(m.group('TLD'))
# com
You can also use numbers, even if custom names are assigned.
print(m.group(0))
# aaa@xxx.com
print(m.group(3))
# com
print(m.group(0, 2, 'TLD'))
# ('aaa@xxx.com', 'xxx', 'com')
Naming does not affect the result of groups()
.
print(m.groups())
# ('aaa', 'xxx', 'com')
Extract each group's string as a dictionary: groupdict()
You can get a dictionary (dict
) where the group names are keys and the matched strings are values using the groupdict()
method.
m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(m.groupdict())
# {'local': 'aaa', 'SLD': 'xxx', 'TLD': 'com'}
print(type(m.groupdict()))
# <class 'dict'>
Match objects in if
statements
When evaluated as Boolean values, match objects are always considered True
.
print(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s))
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>
print(bool(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)))
# True
match()
and search()
return None
when there is no match, which is evaluated as False
.
print(re.match('[0-9]+', s))
# None
print(bool(re.match('[0-9]+', s)))
# False
Therefore, to simply determine whether a match has occurred, you can use match()
or search()
directly or their return values in an if
statement.
if re.match(r'[a-z]+@[a-z]+\.[a-z]+', s):
print('match')
else:
print('no match')
# match
if re.match('[0-9]+', s):
print('match')
else:
print('no match')
# no match
However, be aware that some regex patterns may match a zero-length string (empty string ''
), which is still evaluated as True
.
m = re.match('[0-9]*', s)
print(m)
# <re.Match object; span=(0, 0), match=''>
print(m.group() == '')
# True
print(bool(m))
# True
if re.match('[0-9]*', s):
print('match')
else:
print('no match')
# match
Be careful when using *
to denote zero or more repetitions, as demonstrated in the example.
If you wish to treat a match with an empty string as a non-match, you can first evaluate the match object and then further evaluate the string obtained using the group()
method.