pandas: Replace values in DataFrame and Series with replace()

Posted: 2024-01-17 | Tags: Python, pandas

In pandas, the replace() method allows you to replace values in DataFrame and Series. It is also possible to replace parts of strings using regular expressions (regex).

Contents

Replace values in DataFrame
Replace different values at once
- Specify with a dictionary
- Specify with a list
Replace values in specific columns
Replace using regular expressions (regex)
Replace missing values NaN
Inplace operation

The map() method also replaces values in Series. Regex cannot be used, but in some cases, map() may be faster than replace().

pandas: Replace Series values with map()

For information on how to replace values based on conditions, see the following article.

pandas: Replace values based on conditions with where(), mask()

The pandas version used in this article is as follows. Note that functionality may vary between versions. The following data is used as an example, with some values changed for explanation.

sample_pandas_normal.csv

import pandas as pd

print(pd.__version__)
# 2.1.2

df = pd.read_csv('data/src/sample_pandas_normal.csv')
df.iloc[1, 3] = 24
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

source: pandas_replace.py

Although the sample code focuses on DataFrame, the usage is the same for Series.

Replace values in `DataFrame`

Use the replace() method by specifying the original value as the first argument and the replacement value as the second.

print(df.replace('CA', 'California'))
#       name  age       state  point
# 0    Alice   24          NY     64
# 1      Bob   42  California     24
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen   24  California     88
# 5    Frank   30          NY     57

source: pandas_replace.py

All columns' values are replaced. To replace values only in a specific column, see the following section.

print(df.replace(24, 100))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA    100
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

source: pandas_replace.py

Replace different values at once

The replace() method can simultaneously replace various values using a dictionary or a list.

Specify with a dictionary

Specify a dictionary, {original_value: replacement_value}, as the first argument.

print(df.replace({'CA': 'California', 24: 100}))
#       name  age       state  point
# 0    Alice  100          NY     64
# 1      Bob   42  California    100
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen  100  California     88
# 5    Frank   30          NY     57

source: pandas_replace.py

Specify with a list

Specify a list of original values as the first argument and a list of replacement values as the second. An error occurs if the sizes (number of elements) of the lists do not match.

print(df.replace(['CA', 24], ['California', 100]))
#       name  age       state  point
# 0    Alice  100          NY     64
# 1      Bob   42  California    100
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen  100  California     88
# 5    Frank   30          NY     57

# print(df.replace(['CA', 24, 'NY'], ['California', 100]))
# ValueError: Replacement lists must match in length. Expecting 3 got 2

source: pandas_replace.py

Specifying a scalar value as the second argument replaces all original values in the list with this scalar.

print(df.replace(['CA', 24], 'XXX'))
#       name  age state point
# 0    Alice  XXX    NY    64
# 1      Bob   42   XXX   XXX
# 2  Charlie   18   XXX    70
# 3     Dave   68    TX    70
# 4    Ellen  XXX   XXX    88
# 5    Frank   30    NY    57

source: pandas_replace.py

Replace values in specific columns

Specifying a dictionary, {column_name: {original_value: replacement_value}}, as the first argument allows you to replace values only in specific columns.

print(df.replace({'age': {24: 100}}))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

print(df.replace({'age': {24: 100, 18: 0}, 'point': {24: 50}}))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     50
# 2  Charlie    0    CA     70
# 3     Dave   68    TX     70
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

source: pandas_replace.py

In this case, you cannot use lists for original and replacement values.

# print(df.replace({'age': [[24, 18], [100, 0]], 'point': {24: 50}}))
# TypeError: If a nested mapping is passed, all values of the top level mapping must be mappings

source: pandas_replace.py

To replace multiple values with a single value, specify a dictionary, {column_name: original_value}, as the first argument and the replacement value as a scalar in the second argument. Lists can be used for original values in the first argument.

print(df.replace({'age': 24, 'point': 70}, 100))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA    100
# 3     Dave   68    TX    100
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

print(df.replace({'age': [24, 18], 'point': 70}, 100))
#       name  age state  point
# 0    Alice  100    NY     64
# 1      Bob   42    CA     24
# 2  Charlie  100    CA    100
# 3     Dave   68    TX    100
# 4    Ellen  100    CA     88
# 5    Frank   30    NY     57

source: pandas_replace.py

Replace using regular expressions (regex)

By setting the regex argument to True, you can use regular expressions (regex) for replacements in the replace() method.

By default, replacements occur only for complete value matches, not partial string matches.

print(df.replace('li', 'XX'))
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     24
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

source: pandas_replace.py

When regex is set to True, re.sub() from the re module is used for replacements.

Specify a regex pattern as the first argument and the replacement string as the second. For example, you can replace strings ending with e as follows:

print(df.replace('.*e$', 'NEW_NAME', regex=True))
#        name  age state  point
# 0  NEW_NAME   24    NY     64
# 1       Bob   42    CA     24
# 2  NEW_NAME   18    CA     70
# 3  NEW_NAME   68    TX     70
# 4     Ellen   24    CA     88
# 5     Frank   30    NY     57

source: pandas_replace.py

In this regex pattern, . matches any character, * represents zero or more repetitions, and $ signifies the end of the string.

If regex special characters are not used, specified substrings are replaced.

print(df.replace('li', 'XX', regex=True))
#       name  age state  point
# 0    AXXce   24    NY     64
# 1      Bob   42    CA     24
# 2  CharXXe   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

source: pandas_replace.py

Groups surrounded by () can be referenced in the replacement value as \1, \2, and so on, corresponding to their order in the pattern.

print(df.replace('(.*)li(.*)', r'\2-\1', regex=True))
#      name  age state  point
# 0    ce-A   24    NY     64
# 1     Bob   42    CA     24
# 2  e-Char   18    CA     70
# 3    Dave   68    TX     70
# 4   Ellen   24    CA     88
# 5   Frank   30    NY     57

source: pandas_replace.py

For details on re.sub(), see the following article.

Regular expressions with the re module in Python

Alternatively, str.replace() can be used to replace parts of strings in a specific column.

pandas: Handle strings (replace, strip, case conversion, etc.)

df['name'] = df['name'].str.replace('li', 'XX')
print(df)
#       name  age state  point
# 0    AXXce   24    NY     64
# 1      Bob   42    CA     24
# 2  CharXXe   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

source: pandas_replace.py

Replace missing values `NaN`

To replace missing values NaN, you can use the fillna() method. For details, see the following article.

pandas: Replace NaN (missing values) with fillna()

Inplace operation

By default, replace() returns a new DataFrame with the replaced values. Setting the inplace argument to True modifies the original DataFrame.

df = pd.read_csv('data/src/sample_pandas_normal.csv')
print(df)
#       name  age state  point
# 0    Alice   24    NY     64
# 1      Bob   42    CA     92
# 2  Charlie   18    CA     70
# 3     Dave   68    TX     70
# 4    Ellen   24    CA     88
# 5    Frank   30    NY     57

df.replace('CA', 'California', inplace=True)
print(df)
#       name  age       state  point
# 0    Alice   24          NY     64
# 1      Bob   42  California     92
# 2  Charlie   18  California     70
# 3     Dave   68          TX     70
# 4    Ellen   24  California     88
# 5    Frank   30          NY     57

source: pandas_replace.py

Related Categories

Related Articles