How to Use Regex Match Objects in Python

Modified: 2023-05-18 | Tags: Python, String, Regex

In Python's re module, match() and search() return match objects when a string matches a regular expression pattern. You can extract the matched string and its position using methods provided by the match object.

re - Match Objects — Regular expression operations — Python 3.11.3 documentation

Contents

Get the matched position: start(), end(), span()
Extract the matched string: group()
Grouping in regex patterns
Match objects in if statements

The sample code in this article uses the following string as an example.

import re

s = 'aaa@xxx.com'

source: re_match_object.py

For more information on how to use the functions and other features of the re module, see the following article:

Regular expressions with the re module in Python

Get the matched position: `start()`, `end()`, `span()`

When a string matches a regex pattern using match() or search(), a match object is returned.

m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(type(m))
# <class 're.Match'>

source: re_match_object.py

You can get the position (index) of the matched substring using the match object's methods start(), end(), and span().

print(m.start())
# 0

print(m.end())
# 11

print(m.span())
# (0, 11)

source: re_match_object.py

start() returns the beginning of the matched substring, end() returns the end, and span() returns a tuple containing the beginning and end.

Extract the matched string: `group()`

You can extract the matched part as a string using the match object's group() method.

m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(m.group())
# aaa@xxx.com

print(type(m.group()))
# <class 'str'>

source: re_match_object.py

Grouping in regex patterns

Parentheses () are used to group part of a regex pattern string.

Extract each group's string: `groups()`

You can extract a tuple containing the strings that matched each group using the match object's groups() method.

m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(m.groups())
# ('aaa', 'xxx', 'com')

source: re_match_object.py

Get the string and position of any group

When using grouping, the group() method allows you to access the string of any group by specifying a number as an argument. If the argument is omitted or set to 0, it returns the entire match. Specifying 1 or higher returns the strings of each group in order, and a value larger than the number of groups leads to an error.

m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(m.group())
# aaa@xxx.com

print(m.group(0))
# aaa@xxx.com

print(m.group(1))
# aaa

print(m.group(2))
# xxx

print(m.group(3))
# com

# print(m.group(4))
# IndexError: no such group

source: re_match_object.py

Supplying multiple numbers as arguments to group() returns a tuple with the corresponding strings. This way, you can select only the desired groups.

print(m.group(0, 1, 3))
# ('aaa@xxx.com', 'aaa', 'com')

source: re_match_object.py

start(), end(), and span() work similarly to group(), but do not accept multiple values.

print(m.span())
# (0, 11)

print(m.span(3))
# (8, 11)

# print(m.span(4))
# IndexError: no such group

# print(m.span(0, 1))
# TypeError: span expected at most 1 arguments, got 2

source: re_match_object.py

Nested groups

Grouping parentheses () can be nested. To retrieve the entire match string with groups(), enclose the entire pattern with (). The group order is determined by order of the (.

m = re.match(r'(([a-z]+)@([a-z]+)\.([a-z]+))', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(m.groups())
# ('aaa@xxx.com', 'aaa', 'xxx', 'com')

source: re_match_object.py

Set names for groups

By adding ?P<xxx> at the start of (), you can assign a custom name to the group. Then you can specify the name instead of a number as an argument to group(), start(), end(), or span() to access the corresponding part of the string or its position.

m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(m.group('local'))
# aaa

print(m.group('SLD'))
# xxx

print(m.group('TLD'))
# com

source: re_match_object.py

You can also use numbers, even if custom names are assigned.

print(m.group(0))
# aaa@xxx.com

print(m.group(3))
# com

print(m.group(0, 2, 'TLD'))
# ('aaa@xxx.com', 'xxx', 'com')

source: re_match_object.py

Naming does not affect the result of groups().

print(m.groups())
# ('aaa', 'xxx', 'com')

source: re_match_object.py

Extract each group's string as a dictionary: `groupdict()`

You can get a dictionary (dict) where the group names are keys and the matched strings are values using the groupdict() method.

m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)
print(m)
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(m.groupdict())
# {'local': 'aaa', 'SLD': 'xxx', 'TLD': 'com'}

print(type(m.groupdict()))
# <class 'dict'>

source: re_match_object.py

Match objects in `if` statements

When evaluated as Boolean values, match objects are always considered True.

print(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s))
# <re.Match object; span=(0, 11), match='aaa@xxx.com'>

print(bool(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)))
# True

source: re_match_object.py

match() and search() return None when there is no match, which is evaluated as False.

Convert bool (True, False) and other types to each other in Python

print(re.match('[0-9]+', s))
# None

print(bool(re.match('[0-9]+', s)))
# False

source: re_match_object.py

Therefore, to simply determine whether a match has occurred, you can use match() or search() directly or their return values in an if statement.

if re.match(r'[a-z]+@[a-z]+\.[a-z]+', s):
    print('match')
else:
    print('no match')
# match

source: re_match_object.py

if re.match('[0-9]+', s):
    print('match')
else:
    print('no match')
# no match

source: re_match_object.py

However, be aware that some regex patterns may match a zero-length string (empty string ''), which is still evaluated as True.

m = re.match('[0-9]*', s)
print(m)
# <re.Match object; span=(0, 0), match=''>

print(m.group() == '')
# True

print(bool(m))
# True

if re.match('[0-9]*', s):
    print('match')
else:
    print('no match')
# match

source: re_match_object.py

Be careful when using * to denote zero or more repetitions, as demonstrated in the example.

If you wish to treat a match with an empty string as a non-match, you can first evaluate the match object and then further evaluate the string obtained using the group() method.

How to Use Regex Match Objects in Python

Get the matched position: `start()`, `end()`, `span()`

Extract the matched string: `group()`

Grouping in regex patterns

Extract each group's string: `groups()`

Get the string and position of any group

Nested groups

Set names for groups

Extract each group's string as a dictionary: `groupdict()`

Match objects in `if` statements

Related Categories

Related Articles

How to Use Regex Match Objects in Python

Get the matched position: start(), end(), span()

Extract the matched string: group()

Grouping in regex patterns

Extract each group's string: groups()

Get the string and position of any group

Nested groups

Set names for groups

Extract each group's string as a dictionary: groupdict()

Match objects in if statements

Related Categories

Related Articles

Get the matched position: `start()`, `end()`, `span()`

Extract the matched string: `group()`

Extract each group's string: `groups()`

Extract each group's string as a dictionary: `groupdict()`

Match objects in `if` statements