Extract a Substring from a String in Python (Position, Regex)

Modified: 2025-04-29 | Tags: Python, String, Regex

This article explains how to extract a substring from a string in Python.

You can extract a substring by specifying its position and length, or by using regular expression (regex) patterns.

Contents

Extract a substring by position and length
Extract a substring with regex: re.search(), re.findall()
Examples of regex patterns

For information on how to find the position of a substring or replace it with another string, refer to the following articles:

If you want to extract a substring from the contents of a text file, first read the file as a string.

Read, write, and create files in Python (with and open())

Extract a substring by position and length

Extract a character by index

You can get a character at a specific position by specifying its index in []. Indexes start at 0 (zero-based indexing).

s = 'abcde'

print(s[0])
# a

print(s[4])
# e

source: str_index_slice.py

You can also specify an index from the end of the string by using negative values. -1 refers to the last character.

print(s[-1])
# e

print(s[-5])
# a

source: str_index_slice.py

If you specify an index that does not exist, an error will occur.

# print(s[5])
# IndexError: string index out of range

# print(s[-6])
# IndexError: string index out of range

source: str_index_slice.py

Extract a substring by slicing

You can extract a substring within the range start <= x < stop using the syntax [start:stop]. If start is omitted, slicing begins from the start of the string. If stop is omitted, it continues to the end of the string.

s = 'abcde'

print(s[1:3])
# bc

print(s[:3])
# abc

print(s[1:])
# bcde

source: str_index_slice.py

Negative values are also supported.

print(s[-4:-2])
# bc

print(s[:-2])
# abc

print(s[-4:])
# bcde

source: str_index_slice.py

If start > stop, no error is raised; instead, an empty string ('') is returned.

print(s[3:1])
# 

print(s[3:1] == '')
# True

source: str_index_slice.py

Out-of-range values are automatically adjusted without raising an error.

print(s[-100:100])
# abcde

source: str_index_slice.py

In addition to start and stop, you can also specify a step value using [start:stop:step]. If step is negative, the substring will be returned in reverse order.

print(s[1:4:2])
# bd

print(s[::2])
# ace

print(s[::3])
# ad

print(s[::-1])
# edcba

print(s[::-2])
# eca

source: str_index_slice.py

For more details on slicing, see the following article:

How to slice a list, string, tuple in Python

Extract a substring based on character count

The built-in len() function returns the number of characters in a string. You can use it to get the central character or extract the first or second half of a string by slicing.

Note that only integers (int) are allowed for indexing [] and slicing [:]. If you attempt to use division / inside indexing or slicing, it will raise an error because the result is a floating-point number (float).

The following example uses integer division //, which truncates the decimal part.

s = 'abcdefghi'

print(len(s))
# 9

# print(s[len(s) / 2])
# TypeError: string indices must be integers

print(s[len(s) // 2])
# e

print(s[:len(s) // 2])
# abcd

print(s[len(s) // 2:])
# efghi

source: str_index_slice.py

Extract a substring with regex: `re.search()`, `re.findall()`

In Python, you can use regular expressions (regex) with the re module of the standard library.

Regular expressions with the re module in Python

Use re.search() to extract the first substring that matches a regex pattern. Pass the regex pattern as the first argument and the target string as the second argument.

import re

s = '012-3456-7890'

print(re.search(r'\d+', s))
# <re.Match object; span=(0, 3), match='012'>

source: str_extract_re.py

In regex, \d matches a digit character, while + matches one or more occurrences of the preceding pattern. Therefore, \d+ matches one or more consecutive digits.

Since backslashes \ are used in special sequences like \d, it is convenient to use raw string notation by prefixing the string with r.

Raw strings in Python

If a match is found, re.search() returns a match object. You can retrieve the matched substring using the group() method of the match object.

m = re.search(r'\d+', s)

print(m.group())
# 012

print(type(m.group()))
# <class 'str'>

source: str_extract_re.py

For more information about match objects, refer to the following article:

How to use regex match objects in Python

As shown in the example above, re.search() returns only the first match, even if multiple matches exist. If you want to retrieve all matches, use re.findall(), which returns a list of all matching substrings.

print(re.findall(r'\d+', s))
# ['012', '3456', '7890']

source: str_extract_re.py

Examples of regex patterns

This section provides examples of regex patterns using metacharacters and special sequences.

Wildcard-like patterns

. matches any single character except a newline, and * matches zero or more repetitions of the preceding pattern.

For example, a.*b matches a string that starts with a and ends with b. Since * can match zero occurrences, it also matches ab.

print(re.findall('a.*b', 'axyzb'))
# ['axyzb']

print(re.findall('a.*b', 'a---b'))
# ['a---b']

print(re.findall('a.*b', 'aあいうえおb'))
# ['aあいうえおb']

print(re.findall('a.*b', 'ab'))
# ['ab']

source: str_extract_re.py

+ matches one or more repetitions of the preceding pattern. Therefore, a.+b does not match ab.

print(re.findall('a.+b', 'ab'))
# []

print(re.findall('a.+b', 'axb'))
# ['axb']

print(re.findall('a.+b', 'axxxxxxb'))
# ['axxxxxxb']

source: str_extract_re.py

? matches zero or one occurrence of the preceding pattern. With a.?b, it matches ab and any string with exactly one character between a and b.

print(re.findall('a.?b', 'ab'))
# ['ab']

print(re.findall('a.?b', 'axb'))
# ['axb']

print(re.findall('a.?b', 'axxb'))
# []

source: str_extract_re.py

Greedy and non-greedy matching

*, +, and ? are greedy matches, matching as much text as possible. In contrast, *?, +?, and ?? are non-greedy, minimal matches, matching as few characters as possible.

s = 'axb-axxxxxxb'

print(re.findall('a.*b', s))
# ['axb-axxxxxxb']

print(re.findall('a.*?b', s))
# ['axb', 'axxxxxxb']

source: str_extract_re.py

Extract parts of the pattern with parentheses

You can enclose part of a regex pattern in parentheses () to extract only that part of the match.

print(re.findall('a(.*)b', 'axyzb'))
# ['xyz']

source: str_extract_re.py

To match literal parentheses (), escape them with a backslash \.

print(re.findall(r'\(.+\)', 'abc(def)ghi'))
# ['(def)']

print(re.findall(r'\((.+)\)', 'abc(def)ghi'))
# ['def']

source: str_extract_re.py

Match any single character

Square brackets [] allow you to match any single character contained within.

Using a hyphen - between consecutive Unicode code points (e.g., [a-z]) creates a character range. For example, [a-z] matches any single lowercase letter.

print(re.findall('[abc]x', 'ax-bx-cx'))
# ['ax', 'bx', 'cx']

print(re.findall('[abc]+', 'abc-aaa-cba'))
# ['abc', 'aaa', 'cba']

print(re.findall('[a-z]+', 'abc-xyz'))
# ['abc', 'xyz']

source: str_extract_re.py

Match the start/end of the string

^ matches the start of a string, while $ matches the end.

s = 'abc-def-ghi'

print(re.findall('[a-z]+', s))
# ['abc', 'def', 'ghi']

print(re.findall('^[a-z]+', s))
# ['abc']

print(re.findall('[a-z]+$', s))
# ['ghi']

source: str_extract_re.py

Extract by multiple patterns

| allows you to match a substring that satisfies any one of multiple patterns. For example, to match substrings that follow either pattern A or pattern B, use A|B.

s = 'axxxb-012'

print(re.findall('a.*b', s))
# ['axxxb']

print(re.findall(r'\d+', s))
# ['012']

print(re.findall(r'a.*b|\d+', s))
# ['axxxb', '012']

source: str_extract_re.py

Case-insensitive matching

By default, matching with the re module is case-sensitive. To perform case-insensitive matching, pass re.IGNORECASE to the flags argument.

s = 'abc-Abc-ABC'

print(re.findall('[a-z]+', s))
# ['abc', 'bc']

print(re.findall('[A-Z]+', s))
# ['A', 'ABC']

print(re.findall('[a-z]+', s, flags=re.IGNORECASE))
# ['abc', 'Abc', 'ABC']

source: str_extract_re.py

Extract a Substring from a String in Python (Position, Regex)

Extract a substring by position and length

Extract a character by index

Extract a substring by slicing

Extract a substring based on character count

Extract a substring with regex: `re.search()`, `re.findall()`

Examples of regex patterns

Wildcard-like patterns

Greedy and non-greedy matching

Extract parts of the pattern with parentheses

Match any single character

Match the start/end of the string

Extract by multiple patterns

Case-insensitive matching

Related Categories

Related Articles

Extract a Substring from a String in Python (Position, Regex)

Extract a substring by position and length

Extract a character by index

Extract a substring by slicing

Extract a substring based on character count

Extract a substring with regex: re.search(), re.findall()

Examples of regex patterns

Wildcard-like patterns

Greedy and non-greedy matching

Extract parts of the pattern with parentheses

Match any single character

Match the start/end of the string

Extract by multiple patterns

Case-insensitive matching

Related Categories

Related Articles

Extract a substring with regex: `re.search()`, `re.findall()`