pandas: Replace values in DataFrame and Series with replace()
In pandas, the replace()
method allows you to replace values in DataFrame
and Series
. It is also possible to replace parts of strings using regular expressions (regex).
- pandas.DataFrame.replace — pandas 2.1.3 documentation
- pandas.Series.replace — pandas 2.1.3 documentation
The map()
method also replaces values in Series
. Regex cannot be used, but in some cases, map()
may be faster than replace()
.
For information on how to replace values based on conditions, see the following article.
The pandas version used in this article is as follows. Note that functionality may vary between versions. The following data is used as an example, with some values changed for explanation.
import pandas as pd
print(pd.__version__)
# 2.1.2
df = pd.read_csv('data/src/sample_pandas_normal.csv')
df.iloc[1, 3] = 24
print(df)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 24
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
Although the sample code focuses on DataFrame
, the usage is the same for Series
.
Replace values in DataFrame
Use the replace()
method by specifying the original value as the first argument and the replacement value as the second.
print(df.replace('CA', 'California'))
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 California 24
# 2 Charlie 18 California 70
# 3 Dave 68 TX 70
# 4 Ellen 24 California 88
# 5 Frank 30 NY 57
All columns' values are replaced. To replace values only in a specific column, see the following section.
print(df.replace(24, 100))
# name age state point
# 0 Alice 100 NY 64
# 1 Bob 42 CA 100
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 100 CA 88
# 5 Frank 30 NY 57
Replace different values at once
The replace()
method can simultaneously replace various values using a dictionary or a list.
Specify with a dictionary
Specify a dictionary, {original_value: replacement_value}
, as the first argument.
print(df.replace({'CA': 'California', 24: 100}))
# name age state point
# 0 Alice 100 NY 64
# 1 Bob 42 California 100
# 2 Charlie 18 California 70
# 3 Dave 68 TX 70
# 4 Ellen 100 California 88
# 5 Frank 30 NY 57
Specify with a list
Specify a list of original values as the first argument and a list of replacement values as the second. An error occurs if the sizes (number of elements) of the lists do not match.
print(df.replace(['CA', 24], ['California', 100]))
# name age state point
# 0 Alice 100 NY 64
# 1 Bob 42 California 100
# 2 Charlie 18 California 70
# 3 Dave 68 TX 70
# 4 Ellen 100 California 88
# 5 Frank 30 NY 57
# print(df.replace(['CA', 24, 'NY'], ['California', 100]))
# ValueError: Replacement lists must match in length. Expecting 3 got 2
Specifying a scalar value as the second argument replaces all original values in the list with this scalar.
print(df.replace(['CA', 24], 'XXX'))
# name age state point
# 0 Alice XXX NY 64
# 1 Bob 42 XXX XXX
# 2 Charlie 18 XXX 70
# 3 Dave 68 TX 70
# 4 Ellen XXX XXX 88
# 5 Frank 30 NY 57
Replace values in specific columns
Specifying a dictionary, {column_name: {original_value: replacement_value}}
, as the first argument allows you to replace values only in specific columns.
print(df.replace({'age': {24: 100}}))
# name age state point
# 0 Alice 100 NY 64
# 1 Bob 42 CA 24
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 100 CA 88
# 5 Frank 30 NY 57
print(df.replace({'age': {24: 100, 18: 0}, 'point': {24: 50}}))
# name age state point
# 0 Alice 100 NY 64
# 1 Bob 42 CA 50
# 2 Charlie 0 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 100 CA 88
# 5 Frank 30 NY 57
In this case, you cannot use lists for original and replacement values.
# print(df.replace({'age': [[24, 18], [100, 0]], 'point': {24: 50}}))
# TypeError: If a nested mapping is passed, all values of the top level mapping must be mappings
To replace multiple values with a single value, specify a dictionary, {column_name: original_value}
, as the first argument and the replacement value as a scalar in the second argument. Lists can be used for original values in the first argument.
print(df.replace({'age': 24, 'point': 70}, 100))
# name age state point
# 0 Alice 100 NY 64
# 1 Bob 42 CA 24
# 2 Charlie 18 CA 100
# 3 Dave 68 TX 100
# 4 Ellen 100 CA 88
# 5 Frank 30 NY 57
print(df.replace({'age': [24, 18], 'point': 70}, 100))
# name age state point
# 0 Alice 100 NY 64
# 1 Bob 42 CA 24
# 2 Charlie 100 CA 100
# 3 Dave 68 TX 100
# 4 Ellen 100 CA 88
# 5 Frank 30 NY 57
Replace using regular expressions (regex)
By setting the regex
argument to True
, you can use regular expressions (regex) for replacements in the replace()
method.
By default, replacements occur only for complete value matches, not partial string matches.
print(df.replace('li', 'XX'))
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 24
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
When regex
is set to True
, re.sub()
from the re
module is used for replacements.
Specify a regex pattern as the first argument and the replacement string as the second. For example, you can replace strings ending with e
as follows:
print(df.replace('.*e$', 'NEW_NAME', regex=True))
# name age state point
# 0 NEW_NAME 24 NY 64
# 1 Bob 42 CA 24
# 2 NEW_NAME 18 CA 70
# 3 NEW_NAME 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
In this regex pattern, .
matches any character, *
represents zero or more repetitions, and $
signifies the end of the string.
If regex special characters are not used, specified substrings are replaced.
print(df.replace('li', 'XX', regex=True))
# name age state point
# 0 AXXce 24 NY 64
# 1 Bob 42 CA 24
# 2 CharXXe 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
Groups surrounded by ()
can be referenced in the replacement value as \1
, \2
, and so on, corresponding to their order in the pattern.
print(df.replace('(.*)li(.*)', r'\2-\1', regex=True))
# name age state point
# 0 ce-A 24 NY 64
# 1 Bob 42 CA 24
# 2 e-Char 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
For details on re.sub()
, see the following article.
Alternatively, str.replace()
can be used to replace parts of strings in a specific column.
df['name'] = df['name'].str.replace('li', 'XX')
print(df)
# name age state point
# 0 AXXce 24 NY 64
# 1 Bob 42 CA 24
# 2 CharXXe 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
Replace missing values NaN
To replace missing values NaN
, you can use the fillna()
method. For details, see the following article.
Inplace operation
By default, replace()
returns a new DataFrame
with the replaced values. Setting the inplace
argument to True
modifies the original DataFrame
.
df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 CA 92
# 2 Charlie 18 CA 70
# 3 Dave 68 TX 70
# 4 Ellen 24 CA 88
# 5 Frank 30 NY 57
df.replace('CA', 'California', inplace=True)
print(df)
# name age state point
# 0 Alice 24 NY 64
# 1 Bob 42 California 92
# 2 Charlie 18 California 70
# 3 Dave 68 TX 70
# 4 Ellen 24 California 88
# 5 Frank 30 NY 57