note.nkmk.me

Replace strings in Python (replace, translate, re.sub, re.subn)

Posted: 2019-05-29 / Modified: 2020-08-19 / Tags: Python, String, Regular expression

This post describes how to replace strings in Python.

  • Replace substrings: replace()
    • Specify the maximum count of replacements: count
    • Replace multiple different substrings
    • Replace newline character
  • Replace multiple different characters: translate()
  • Replace with regular expression: re.sub(), re.subn()
    • Replace multiple substrings with the same string
    • Replace using the matched part
    • Get the count of replaced parts
  • Replace by position: slice

In any case, you can delete the original string by specifying the empty string '' as the replacement string.

Sponsored Link

Replace substrings: replace()

Use replace() to replace substrings.

Specify the old string old for the first argument and the new string new for the second argument.

s = 'one two one two one'

print(s.replace(' ', '-'))
# one-two-one-two-one

Specifying the empty string '' as new will delete old.

print(s.replace(' ', ''))
# onetwoonetwoone

Specify the maximum count of replacements: count

You can specify the maximum number of replacements in the third argument count. If the argument count is given, only the first count occurrences are replaced.

print(s.replace('one', 'XXX'))
# XXX two XXX two XXX

print(s.replace('one', 'XXX', 2))
# XXX two XXX two one

Replace multiple different substrings

When replacing multiple different strings with the same string, use the regular expression described later.

There is no method to replace multiple different strings with different ones, but you can apply replace() repeatedly.

print(s.replace('one', 'XXX').replace('two', 'YYY'))
# XXX YYY XXX YYY XXX

It just calls replace() in order, so if the first new contains the following old, the first new is also replaced. You need to be careful in order.

print(s.replace('one', 'XtwoX').replace('two', 'YYY'))
# XYYYX YYY XYYYX YYY XYYYX

print(s.replace('two', 'YYY').replace('one', 'XtwoX'))
# XtwoX YYY XtwoX YYY XtwoX

When replacing multiple characters (a string of length 1), you can use the translate() method described below.

Replace newline character

If there is only one type of newline character, you can specify it as the first argument of replace().

s_lines = 'one\ntwo\nthree'
print(s_lines)
# one
# two
# three

print(s_lines.replace('\n', '-'))
# one-two-three

Be careful if \n (LF) used in Unix OS including Mac and \r\n (CR + LF) used in Windows OS are mixed.

Since \n is included in \r\n, the desired result can not be obtained depending on the order. The following example also shows the result of repr() that outputs \n and \r as a string.

s_lines_multi = 'one\ntwo\r\nthree'
print(s_lines_multi)
# one
# two
# three

print(repr(s_lines_multi))
# 'one\ntwo\r\nthree'

print(s_lines_multi.replace('\r\n', '-').replace('\n', '-'))
# one-two-three

print(repr(s_lines_multi.replace('\r\n', '-').replace('\n', '-')))
# 'one-two-three'

print(s_lines_multi.replace('\n', '-').replace('\r\n', '-'))
# -threeo

print(repr(s_lines_multi.replace('\n', '-').replace('\r\n', '-')))
# 'one-two\r-three'

It is also possible to use splitlines(), which returns a list split with various newline characters, and join(), which combines lists with strings.

This way is safe and recommended especially if you do not know what newline characters are included.

print(s_lines_multi.splitlines())
# ['one', 'two', 'three']

print('-'.join(s_lines_multi.splitlines()))
# one-two-three

Replace multiple different characters: translate()

Use the translate() method to replace multiple different characters.

The translation table specified in translate() is created by the str.maketrans().

Specify a dictionary whose key is the old character and whose value is the new string in the str.maketrans().

The old character must be a character (a string of length 1). The new string is a string or None, where None removes old characters.

s = 'one two one two one'

print(s.translate(str.maketrans({'o': 'O', 't': 'T'})))
# One TwO One TwO One

print(s.translate(str.maketrans({'o': 'XXX', 't': None})))
# XXXne wXXX XXXne wXXX XXXne

str.maketrans() can also take three strings as arguments instead of a dictionary.

The first argument is a string in which old characters are concatenated, the second argument is a string in which new characters are concatenated, and the third argument is a string in which characters to be deleted are concatenated.

print(s.translate(str.maketrans('ow', 'XY', 'n')))
# Xe tYX Xe tYX Xe

In this case, the lengths of the first and second arguments must match.

# print(s.translate(str.maketrans('ow', 'XXY', 'n')))
# ValueError: the first two maketrans arguments must have equal length
Sponsored Link

Replace with regular expression: re.sub(), re.subn()

If you use replace() or translate(), they will be replaced if they completely match the old string.

If you want to replace a string that matches a regular expression instead of perfect match, use the sub() of the re module.

In re.sub(), specify a regular expression pattern in the first argument, a new string in the second argument, and a string to be processed in the third argument.

import re

s = 'aaa@xxx.com bbb@yyy.com ccc@zzz.com'

print(re.sub('[a-z]*@', 'ABC@', s))
# ABC@xxx.com ABC@yyy.com ABC@zzz.com

As with replace(), you can specify the maximum count of replacements in the fourth argument count.

print(re.sub('[a-z]*@', 'ABC@', s, 2))
# ABC@xxx.com ABC@yyy.com ccc@zzz.com

Replace multiple substrings with the same string

The following two are useful to remember even if you are not familiar with regular expressions.

Enclose a string with [] to match any single character in it. It can be used to replace multiple different characters with the same string.

print(re.sub('[xyz]', '1', s))
# aaa@111.com bbb@111.com ccc@111.com

If patterns are delimited by |, it matches any pattern. Of course, it is possible to use special characters of regular expression for each pattern, but it is OK even if normal string is specified as it is.

It can be used to replace multiple different strings with the same string.

print(re.sub('aaa|bbb|ccc', 'ABC', s))
# ABC@xxx.com ABC@yyy.com ABC@zzz.com

Replace using the matched part

If part of the pattern is enclosed in (), you can use a string that matches the part enclosed in () in the new string.

print(re.sub('([a-z]*)@', '\\1-123@', s))
# aaa-123@xxx.com bbb-123@yyy.com ccc-123@zzz.com

print(re.sub('([a-z]*)@', r'\1-123@', s))
# aaa-123@xxx.com bbb-123@yyy.com ccc-123@zzz.com

\1 corresponds to the part that matches (). If there are multiple (), use them like \2, \3 ... .

It is necessary to escape \ like \\1 if it is a normal string surrounded by '' or "", but if it is a raw string with r at the beginning like r'', you can write \1.

Get the count of replaced parts

re.subn() returns a tuple of the replaced string and the number of parts replaced.

t = re.subn('[a-z]*@', 'ABC@', s)
print(t)
# ('ABC@xxx.com ABC@yyy.com ABC@zzz.com', 3)

print(type(t))
# <class 'tuple'>

print(t[0])
# ABC@xxx.com ABC@yyy.com ABC@zzz.com

print(t[1])
# 3

Replace by position: slice

Although there is no method for specifying position and replacing, by dividing by a slice and concatenating them with an arbitrary string, a new string in which a specified position is replaced can be created.

s = 'abcdefghij'

print(s[:4] + 'XXX' + s[7:])
# abcdXXXhij

The length of the string (number of characters) can be obtained with len(), so it can be written as follows:

s_replace = 'XXX'
i = 4

print(s[:i] + s_replace + s[i + len(s_replace):])
# abcdXXXhij

The number of characters does not have to match, as it just concatenates different string between the split strings.

print(s[:4] + '-' + s[7:])
# abcd-hij

It is also possible to create new string by inserting another string anywhere in the string.

print(s[:4] + '+++++' + s[4:])
# abcd+++++efghij

See the following post for details of slicing.

Sponsored Link
Share

Related Categories

Related Posts