Replace Strings in Python: replace(), translate(), and Regex
In Python, you can replace strings using the replace()
and translate()
methods, or with regular expression functions like re.sub()
and re.subn()
.
Additionally, you can replace substrings at specific positions using slicing.
To remove a substring, simply replace it with an empty string (''
).
If you need to extract substrings or find their positions, refer to the following articles:
- Extract a substring from a string in Python (position, regex)
- Search for a string in Python (Check if a substring is included/Get a substring position)
For converting between uppercase and lowercase letters, see the following article:
To replace the content in a text file, read the file into a string, process it, and save the result back to the file.
Replace substrings in a string: replace()
Basic usage
The replace()
method replaces all occurrences of a substring with another.
Provide the substring to replace as the first argument (old
) and the new substring as the second (new
).
s = 'one two one two one'
print(s.replace(' ', '-'))
# one-two-one-two-one
To remove substrings, pass an empty string as new
.
print(s.replace(' ', ''))
# onetwoonetwoone
Limit the number of replacements: count
You can limit the number of replacements by providing the third argument, count
. Only the first count
occurrences will be replaced.
s = 'one two one two one'
print(s.replace('one', 'XXX'))
# XXX two XXX two XXX
print(s.replace('one', 'XXX', 2))
# XXX two XXX two one
Replace different substrings
To replace multiple different substrings with the same value, regular expressions are useful (see below).
If replacing different substrings with different values, apply replace()
repeatedly.
s = 'one two one two one'
print(s.replace('one', 'XXX').replace('two', 'YYY'))
# XXX YYY XXX YYY XXX
Note that replacements occur in the order they are called. If a new substring contains another substring targeted later, it will also be replaced.
print(s.replace('one', 'XtwoX').replace('two', 'YYY'))
# XYYYX YYY XYYYX YYY XYYYX
print(s.replace('two', 'YYY').replace('one', 'XtwoX'))
# XtwoX YYY XtwoX YYY XtwoX
To replace characters individually, use the translate()
method, discussed later in this article.
Swap strings
Swapping two substrings using sequential replace()
calls may not work as expected.
s = 'one two one two one'
print(s.replace('one', 'two').replace('two', 'one'))
# one one one one one
To handle this, first replace one of the substrings with a temporary placeholder.
print(s.replace('one', 'X').replace('two', 'one').replace('X', 'two'))
# two one two one two
You can wrap this logic in a function:
def swap_str(s_org, s1, s2, temp='*q@w-e~r^'):
return s_org.replace(s1, temp).replace(s2, s1).replace(temp, s2)
print(swap_str(s, 'one', 'two'))
# two one two one two
Ensure the placeholder value (temp
) does not appear in the original string. If necessary, verify its uniqueness before proceeding. In the example above, temp
is simply set to an arbitrary string.
To swap individual characters, refer to the translate()
method later in this article.
Replace newline character
If the string contains only one type of newline character, you can directly pass it as the first argument in replace()
.
s_lines = 'one\ntwo\nthree'
print(s_lines)
# one
# two
# three
print(s_lines.replace('\n', '-'))
# one-two-three
However, if both \n
(LF, used in Unix-based systems including macOS) and \r\n
(CRLF, used in Windows) appear in the string, the order of replacements may affect the result because \n
is part of \r\n
.
In such cases, splitting the string with splitlines()
and rejoining it with join()
is a safer approach.
print(s_lines_multi.splitlines())
# ['one', 'two', 'three']
print('-'.join(s_lines_multi.splitlines()))
# one-two-three
For more information on handling line breaks, see the following article:
Replace characters in a string: translate()
Basic usage
The translate()
method replaces multiple characters in a string using a translation table created by str.maketrans()
.
You can pass a dictionary to str.maketrans()
, where each key is a single character to be replaced, and the corresponding value is the replacement string or None
to remove it.
s = 'one two one two one'
print(s.translate(str.maketrans({'o': 'O', 't': 'T'})))
# One TwO One TwO One
print(s.translate(str.maketrans({'o': 'XXX', 't': None})))
# XXXne wXXX XXXne wXXX XXXne
Alternatively, provide two strings of equal length to map characters one-to-one. An optional third string specifies characters to be removed.
print(s.translate(str.maketrans('ot', 'OT', 'n')))
# Oe TwO Oe TwO Oe
Ensure the first and second strings are the same length.
# print(s.translate(str.maketrans('ow', 'OTX', 'n')))
# ValueError: the first two maketrans arguments must have equal length
Swap characters
To swap characters, define a mapping and apply translate()
.
s = 'one two one two one'
print(s.translate(str.maketrans({'o': 't', 't': 'o'})))
# tne owt tne owt tne
print(s.translate(str.maketrans('ot', 'to')))
# tne owt tne owt tne
Replace strings by regex: re.sub()
and re.subn()
If you need to replace substrings based on regex patterns, use the sub()
or subn()
functions from the re
module.
Basic usage
In re.sub()
, the first argument is the regex pattern, the second is the replacement string, and the third is the target string.
import re
s = 'aaa@xxx.com bbb@yyy.net ccc@zzz.org'
print(re.sub('[a-z]+@', 'ABC@', s))
# ABC@xxx.com ABC@yyy.net ABC@zzz.org
As with replace()
, you can optionally pass the maximum number of replacements using the fourth argument, count
.
print(re.sub('[a-z]+@', 'ABC@', s, 2))
# ABC@xxx.com ABC@yyy.net ccc@zzz.org
To optimize performance when reusing the same regex, compile the pattern using re.compile()
and call its sub()
method.
p = re.compile('[a-z]+@')
print(p.sub('ABC@', s))
# ABC@xxx.com ABC@yyy.net ABC@zzz.org
For more details on the re
module, refer to the following article:
Replace different substrings with the same string
Even if you're not familiar with regex, the following two techniques can be helpful.
Use square brackets ([]
) to create a pattern matching any character within the brackets. This pattern allows you to replace multiple characters with the same string.
s = 'aaa@xxx.com bbb@yyy.net ccc@zzz.org'
print(re.sub('[xyz]', '1', s))
# aaa@111.com bbb@111.net ccc@111.org
Use the |
operator to match multiple patterns. Each pattern may include special regex characters or literal substrings. This allows you to replace different substrings with the same string.
print(re.sub('com|net|org', 'biz', s))
# aaa@xxx.biz bbb@yyy.biz ccc@zzz.biz
Use the matched part in the replacement
By enclosing parts of the pattern in parentheses (()
), you can refer to the matched groups in the replacement string.
s = 'aaa@xxx.com bbb@yyy.net ccc@zzz.org'
print(re.sub('([a-z]+)@([a-z]+)', '\\2@\\1', s))
# xxx@aaa.com yyy@bbb.net zzz@ccc.org
print(re.sub('([a-z]+)@([a-z]+)', r'\2@\1', s))
# xxx@aaa.com yyy@bbb.net zzz@ccc.org
In regular strings (''
or ""
), use double backslashes (\\1
) to reference a group. In raw strings (r''
or r""
), a single backslash (\1
) works.
To perform more complex replacements, provide a function that receives a match object and returns the replacement.
def func(matchobj):
return matchobj.group(2).upper() + '@' + matchobj.group(1)
print(re.sub('([a-z]+)@([a-z]+)', func, s))
# XXX@aaa.com YYY@bbb.net ZZZ@ccc.org
You can also use a lambda expression:
print(re.sub('([a-z]+)@([a-z]+)', lambda m: m.group(2).upper() + '@' + m.group(1), s))
# XXX@aaa.com YYY@bbb.net ZZZ@ccc.org
For more information on match objects, see the following article:
Get the number of replacements
The re.subn()
function returns a tuple containing the modified string and the number of replacements made.
s = 'aaa@xxx.com bbb@yyy.net ccc@zzz.org'
t = re.subn('[a-z]*@', 'ABC@', s)
print(t)
# ('ABC@xxx.com ABC@yyy.net ABC@zzz.org', 3)
print(type(t))
# <class 'tuple'>
print(t[0])
# ABC@xxx.com ABC@yyy.net ABC@zzz.org
print(t[1])
# 3
The usage of re.subn()
is identical to re.sub()
but provides the additional count information.
You can also reference parts matched by capturing groups ()
or specify the maximum number of replacements.
print(re.subn('([a-z]+)@([a-z]+)', r'\2@\1', s, 2))
# ('xxx@aaa.com yyy@bbb.net ccc@zzz.org', 2)
Replace strings by position: slicing
Although Python does not have a built-in method to replace substrings at specific positions, you can achieve this by splitting the string with slicing and concatenating the parts with the replacement string.
s = 'abcdefghij'
print(s[:4] + 'XXX' + s[7:])
# abcdXXXhij
The length of the string can be determined using len()
.
s_replace = 'XXX'
i = 4
print(s[:i] + s_replace + s[i + len(s_replace):])
# abcdXXXhij
This approach works regardless of whether the original and replacement strings have the same length.
print(s[:4] + '-' + s[7:])
# abcd-hij
You can also insert a new substring at any position within the original string using a similar slicing technique.
print(s[:4] + '+++++' + s[4:])
# abcd+++++efghij
For detailed information on slicing, refer to the following article: