pandas: Replace Series values with map()

Posted: 2024-01-17 | Tags: Python, pandas

In pandas, you can replace values in a Series using the map() method with a dictionary. The replace() method can also replace values, but depending on the conditions, map() may be faster.

Contents

Differences between map() and replace() in replacement
Speed comparison

map() is also used to apply functions to each value in a Series.

pandas: Apply functions to elements, rows, columns with map(), apply()

The pandas version used in this article is as follows. Note that functionality may vary between versions.

import pandas as pd

print(pd.__version__)
# 2.1.2

source: pandas_map_replace.py

Differences between `map()` and `replace()` in replacement

When a dictionary (dict) is specified in map(), values in the Series matching a dictionary key are replaced with the corresponding dictionary value.

s = pd.Series(['A', 'B', 'C', 'A', 'B'])
print(s)
# 0    A
# 1    B
# 2    C
# 3    A
# 4    B
# dtype: object

print(s.map({'A': 'XX', 'B': 'YY', 'C': 'ZZ'}))
# 0    XX
# 1    YY
# 2    ZZ
# 3    XX
# 4    YY
# dtype: object

source: pandas_map_replace.py

You can also specify a dictionary in replace(). If all values in the Series are to be replaced, the result is the same as with map().

print(s.replace({'A': 'XX', 'B': 'YY', 'C': 'ZZ'}))
# 0    XX
# 1    YY
# 2    ZZ
# 3    XX
# 4    YY
# dtype: object

source: pandas_map_replace.py

When the dictionary keys do not cover all values in the Series, the results differ. With map(), unmatched values become NaN, whereas with replace(), they remain unchanged.

print(s.map({'A': 'XX'}))
# 0     XX
# 1    NaN
# 2    NaN
# 3     XX
# 4    NaN
# dtype: object

print(s.replace({'A': 'XX'}))
# 0    XX
# 1     B
# 2     C
# 3    XX
# 4     B
# dtype: object

source: pandas_map_replace.py

To preserve values in the Series that map() does not match, use the original Series in the fillna() method to fill NaN.

pandas: Replace NaN (missing values) with fillna()

print(s.map({'A': 'XX'}).fillna(s))
# 0    XX
# 1     B
# 2     C
# 3    XX
# 4     B
# dtype: object

source: pandas_map_replace.py

Note that replace() allows for more complex operations such as using regular expressions to replace parts of strings, or replacing values differently for each column in a DataFrame. For more details, see the following article.

pandas: Replace values in DataFrame and Series with replace()

Speed comparison

Measure the execution time of map() and replace() using the Jupyter Notebook magic command, %%timeit, which does not function in a regular Python script.

Measure execution time with timeit in Python

Consider a Series of 100 values.

s = pd.Series(range(100))

source: pandas_map_replace_timeit.py

map() is faster than replace() when all values are replaced.

d_100 = {i: i * 10 for i in range(100)}

%%timeit
s.map(d_100)
# 70.7 µs ± 2.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
s.replace(d_100)
# 1.31 ms ± 26.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

source: pandas_map_replace_timeit.py

Even when replacing with a dictionary of 50 elements, map() combined with fillna() is faster than replace().

d_50 = {i: i * 10 for i in range(50)}

%%timeit
s.map(d_50).fillna(s)
# 108 µs ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
s.replace(d_50)
# 653 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

source: pandas_map_replace_timeit.py

However, when replacing with a dictionary of 5 elements, replace() is faster than map() combined with fillna().

d_5 = {i: i * 10 for i in range(5)}

%%timeit
s.map(d_5).fillna(s)
# 104 µs ± 3.85 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
s.replace(d_5)
# 78.5 µs ± 860 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

source: pandas_map_replace_timeit.py

The execution time of replace() greatly depends on the size of the dictionary.

Since results can vary based on the execution environment and other factors, it is recommended to test both map() and replace() under real-world conditions, especially when speed is crucial, before making a decision.

pandas: Replace Series values with map()

Differences between `map()` and `replace()` in replacement

Speed comparison

Related Categories

Related Articles

pandas: Replace Series values with map()

Differences between map() and replace() in replacement

Speed comparison

Related Categories

Related Articles

Differences between `map()` and `replace()` in replacement