pandas: Replace values in DataFrame and Series with replace()

Posted: | Tags: Python, pandas

In pandas, the replace() method allows you to replace values in DataFrame and Series. It is also possible to replace parts of strings using regular expressions (regex).

The map() method also replaces values in Series. Regex cannot be used, but in some cases, map() may be faster than replace().

For information on how to replace values based on conditions, see the following article.

The pandas version used in this article is as follows. Note that functionality may vary between versions. The following data is used as an example, with some values changed for explanation.

import pandas as pd

print(pd.__version__)
# 2.1.2

df = pd.read_csv('data/src/sample_pandas_normal.csv')
df.iloc[1, 3] = 24
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

Although the sample code focuses on DataFrame, the usage is the same for Series.

Replace values in DataFrame

Use the replace() method by specifying the original value as the first argument and the replacement value as the second.

print(df.replace('CA', 'California'))
#       name  age       state  point
# 0    Alice   24          NY     64
# 1      Bob   42  California     24
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen   24  California     88
# 5    Frank   30          NY     57

All columns' values are replaced. To replace values only in a specific column, see the following section.

print(df.replace(24, 100))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA    100
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

Replace different values at once

The replace() method can simultaneously replace various values using a dictionary or a list.

Specify with a dictionary

Specify a dictionary, {original_value: replacement_value}, as the first argument.

print(df.replace({'CA': 'California', 24: 100}))
#       name  age       state  point
# 0    Alice  100          NY     64
# 1      Bob   42  California    100
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen  100  California     88
# 5    Frank   30          NY     57

Specify with a list

Specify a list of original values as the first argument and a list of replacement values as the second. An error occurs if the sizes (number of elements) of the lists do not match.

print(df.replace(['CA', 24], ['California', 100]))
#       name  age       state  point
# 0    Alice  100          NY     64
# 1      Bob   42  California    100
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen  100  California     88
# 5    Frank   30          NY     57

# print(df.replace(['CA', 24, 'NY'], ['California', 100]))
# ValueError: Replacement lists must match in length. Expecting 3 got 2

Specifying a scalar value as the second argument replaces all original values in the list with this scalar.

print(df.replace(['CA', 24], 'XXX'))
#       name  age state point
# 0    Alice  XXX    NY    64
# 1      Bob   42   XXX   XXX
# 2  Charlie   18   XXX    70
# 3     Dave   68    TX    70
# 4    Ellen  XXX   XXX    88
# 5    Frank   30    NY    57

Replace values in specific columns

Specifying a dictionary, {column_name: {original_value: replacement_value}}, as the first argument allows you to replace values only in specific columns.

print(df.replace({'age': {24: 100}}))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

print(df.replace({'age': {24: 100, 18: 0}, 'point': {24: 50}}))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     50
# 2  Charlie    0    CA     70
# 3     Dave   68    TX     70
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

In this case, you cannot use lists for original and replacement values.

# print(df.replace({'age': [[24, 18], [100, 0]], 'point': {24: 50}}))
# TypeError: If a nested mapping is passed, all values of the top level mapping must be mappings

To replace multiple values with a single value, specify a dictionary, {column_name: original_value}, as the first argument and the replacement value as a scalar in the second argument. Lists can be used for original values in the first argument.

print(df.replace({'age': 24, 'point': 70}, 100))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA    100
# 3     Dave   68    TX    100
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

print(df.replace({'age': [24, 18], 'point': 70}, 100))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     24
# 2  Charlie  100    CA    100
# 3     Dave   68    TX    100
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

Replace using regular expressions (regex)

By setting the regex argument to True, you can use regular expressions (regex) for replacements in the replace() method.

By default, replacements occur only for complete value matches, not partial string matches.

print(df.replace('li', 'XX'))
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

When regex is set to True, re.sub() from the re module is used for replacements.

Specify a regex pattern as the first argument and the replacement string as the second. For example, you can replace strings ending with e as follows:

print(df.replace('.*e$', 'NEW_NAME', regex=True))
#        name  age state  point
# 0  NEW_NAME   24    NY     64
# 1       Bob   42    CA     24
# 2  NEW_NAME   18    CA     70
# 3  NEW_NAME   68    TX     70
# 4     Ellen   24    CA     88
# 5     Frank   30    NY     57

In this regex pattern, . matches any character, * represents zero or more repetitions, and $ signifies the end of the string.

If regex special characters are not used, specified substrings are replaced.

print(df.replace('li', 'XX', regex=True))
#       name  age state  point
# 0    AXXce   24    NY     64
# 1      Bob   42    CA     24
# 2  CharXXe   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

Groups surrounded by () can be referenced in the replacement value as \1, \2, and so on, corresponding to their order in the pattern.

print(df.replace('(.*)li(.*)', r'\2-\1', regex=True))
#      name  age state  point
# 0    ce-A   24    NY     64
# 1     Bob   42    CA     24
# 2  e-Char   18    CA     70
# 3    Dave   68    TX     70
# 4   Ellen   24    CA     88
# 5   Frank   30    NY     57

For details on re.sub(), see the following article.

Alternatively, str.replace() can be used to replace parts of strings in a specific column.

df['name'] = df['name'].str.replace('li', 'XX')
print(df)
#       name  age state  point
# 0    AXXce   24    NY     64
# 1      Bob   42    CA     24
# 2  CharXXe   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

Replace missing values NaN

To replace missing values NaN, you can use the fillna() method. For details, see the following article.

Inplace operation

By default, replace() returns a new DataFrame with the replaced values. Setting the inplace argument to True modifies the original DataFrame.

df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

df.replace('CA', 'California', inplace=True)
print(df)
#       name  age       state  point
# 0    Alice   24          NY     64
# 1      Bob   42  California     92
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen   24  California     88
# 5    Frank   30          NY     57

Related Categories

Related Articles