pandas: Replace Series values with map()

Posted: | Tags: Python, pandas

In pandas, you can replace values in a Series using the map() method with a dictionary. The replace() method can also replace values, but depending on the conditions, map() may be faster.

map() is also used to apply functions to each value in a Series.

The pandas version used in this article is as follows. Note that functionality may vary between versions.

import pandas as pd

print(pd.__version__)
# 2.1.2

Differences between map() and replace() in replacement

When a dictionary (dict) is specified in map(), values in the Series matching a dictionary key are replaced with the corresponding dictionary value.

s = pd.Series(['A', 'B', 'C', 'A', 'B'])
print(s)
# 0    A
# 1    B
# 2    C
# 3    A
# 4    B
# dtype: object

print(s.map({'A': 'XX', 'B': 'YY', 'C': 'ZZ'}))
# 0    XX
# 1    YY
# 2    ZZ
# 3    XX
# 4    YY
# dtype: object

You can also specify a dictionary in replace(). If all values in the Series are to be replaced, the result is the same as with map().

print(s.replace({'A': 'XX', 'B': 'YY', 'C': 'ZZ'}))
# 0    XX
# 1    YY
# 2    ZZ
# 3    XX
# 4    YY
# dtype: object

When the dictionary keys do not cover all values in the Series, the results differ. With map(), unmatched values become NaN, whereas with replace(), they remain unchanged.

print(s.map({'A': 'XX'}))
# 0     XX
# 1    NaN
# 2    NaN
# 3     XX
# 4    NaN
# dtype: object

print(s.replace({'A': 'XX'}))
# 0    XX
# 1     B
# 2     C
# 3    XX
# 4     B
# dtype: object

To preserve values in the Series that map() does not match, use the original Series in the fillna() method to fill NaN.

print(s.map({'A': 'XX'}).fillna(s))
# 0    XX
# 1     B
# 2     C
# 3    XX
# 4     B
# dtype: object

Note that replace() allows for more complex operations such as using regular expressions to replace parts of strings, or replacing values differently for each column in a DataFrame. For more details, see the following article.

Speed comparison

Measure the execution time of map() and replace() using the Jupyter Notebook magic command, %%timeit, which does not function in a regular Python script.

Consider a Series of 100 values.

s = pd.Series(range(100))

map() is faster than replace() when all values are replaced.

d_100 = {i: i * 10 for i in range(100)}

%%timeit
s.map(d_100)
# 70.7 µs ± 2.08 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
s.replace(d_100)
# 1.31 ms ± 26.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Even when replacing with a dictionary of 50 elements, map() combined with fillna() is faster than replace().

d_50 = {i: i * 10 for i in range(50)}

%%timeit
s.map(d_50).fillna(s)
# 108 µs ± 3.1 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
s.replace(d_50)
# 653 µs ± 3.73 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

However, when replacing with a dictionary of 5 elements, replace() is faster than map() combined with fillna().

d_5 = {i: i * 10 for i in range(5)}

%%timeit
s.map(d_5).fillna(s)
# 104 µs ± 3.85 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
s.replace(d_5)
# 78.5 µs ± 860 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

The execution time of replace() greatly depends on the size of the dictionary.

Since results can vary based on the execution environment and other factors, it is recommended to test both map() and replace() under real-world conditions, especially when speed is crucial, before making a decision.

Related Categories

Related Articles