pandas: Apply functions to values, rows, columns with map(), apply()

Modified: 2024-01-17 | Tags: Python, pandas

In pandas, you can use map(), apply(), and applymap() methods to apply functions to values (element-wise), rows, or columns in DataFrames and Series.

Contents

Apply functions to values in Series: map(), apply()
- How to use map()
- How to use apply()
Apply functions to values in DataFrame: map(), applymap()
Apply functions to rows and columns in DataFrame: apply()
Use methods of DataFrame and Series, and arithmetic Operators
Use NumPy functions
Speed comparison

As mentioned later, DataFrame and Series already include methods for common operations. Additionally, you can apply NumPy functions to DataFrame and Series. Using dedicated methods or NumPy functions is preferable to map() or apply() due to better performance.

The pandas and NumPy versions used in this article are as follows. Note that functionality may vary between versions.

import pandas as pd
import numpy as np

print(pd.__version__)
# 2.1.2

print(np.__version__)
# 1.26.1

source: pandas_numpy_function.py

Apply functions to values in `Series`: `map()`, `apply()`

To apply a function to each value in a Series (element-wise), use the map() or apply() methods.

How to use `map()`

Passing a function to map() returns a new Series, with the function applied to each value. For example, apply the built-in hex() function to convert integers to hexadecimal strings.

Convert binary, octal, decimal, and hexadecimal in Python

s = pd.Series([1, 10, 100])
print(s)
# 0      1
# 1     10
# 2    100
# dtype: int64

print(s.map(hex))
# 0     0x1
# 1     0xa
# 2    0x64
# dtype: object

source: pandas_series_map_apply.py

You can also apply functions defined with def or lambda expressions.

def my_func(x):
    return x * 10

print(s.map(my_func))
# 0      10
# 1     100
# 2    1000
# dtype: int64

print(s.map(lambda x: x * 10))
# 0      10
# 1     100
# 2    1000
# dtype: int64

source: pandas_series_map_apply.py

The above example is for illustrative purposes; simple arithmetic operations can be directly performed on a Series.

print(s * 10)
# 0      10
# 1     100
# 2    1000
# dtype: int64

source: pandas_series_map_apply.py

By default, missing values (NaN) are passed to the function, but if you set the second argument na_action to 'ignore', NaN will not be passed to the function and the result will remain as NaN.

Because the presence of NaN changes the data type (dtype) to a floating-point number (float), values are converted to integers (int) using int() before being passed to hex() in the following example.

s_nan = pd.Series([1, float('nan'), 100])
print(s_nan)
# 0      1.0
# 1      NaN
# 2    100.0
# dtype: float64

# print(s_nan.map(lambda x: hex(int(x))))
# ValueError: cannot convert float NaN to integer

print(s_nan.map(lambda x: hex(int(x)), na_action='ignore'))
# 0     0x1
# 1     NaN
# 2    0x64
# dtype: object

source: pandas_series_map_apply.py

You can also pass a dictionary (dict) to map(). In this case, it replaces values. For more details, refer to the following article.

pandas: Replace Series values with map()

How to use `apply()`

Similar to map(), the function specified as the first argument in apply() is applied to each value. The difference is that apply() allows you to specify arguments to be passed to the function.

With map(), you need to use a lambda expression or similar approach to pass arguments to the function. For example, specify the base argument in the int() function, which converts strings to integers.

s = pd.Series(['11', 'AA', 'FF'])
print(s)
# 0    11
# 1    AA
# 2    FF
# dtype: object

# print(s.map(int, base=16))
# TypeError: Series.map() got an unexpected keyword argument 'base'

print(s.map(lambda x: int(x, 16)))
# 0     17
# 1    170
# 2    255
# dtype: int64

source: pandas_series_map_apply.py

With apply(), any specified keyword arguments are passed directly to the function. It is also possible to specify positional arguments using the args argument.

print(s.apply(int, base=16))
# 0     17
# 1    170
# 2    255
# dtype: int64

print(s.apply(int, args=(16,)))
# 0     17
# 1    170
# 2    255
# dtype: int64

source: pandas_series_map_apply.py

Note that even if there is only one positional argument, it must be specified as a tuple or list in the args argument. A comma is necessary at the end of a one-element tuple.

A tuple with one element requires a comma in Python

As of version 2.1.2, apply() does not have the na_action argument.

Apply functions to values in `DataFrame`: `map()`, `applymap()`

To apply a function to each value in a DataFrame (element-wise), use the map() or applymap() methods.

As of version 2.1.0, applymap() has been renamed to map() and marked as deprecated.

As of version 2.1.2, applymap() is still usable but issues a FutureWarning.

df = pd.DataFrame([[1, 10, 100], [2, 20, 200]])
print(df)
#    0   1    2
# 0  1  10  100
# 1  2  20  200

print(df.map(hex))
#      0     1     2
# 0  0x1   0xa  0x64
# 1  0x2  0x14  0xc8

print(df.applymap(hex))
#      0     1     2
# 0  0x1   0xa  0x64
# 1  0x2  0x14  0xc8
# 
# /var/folders/rf/b7l8_vgj5mdgvghn_326rn_c0000gn/T/ipykernel_36685/2076800564.py:1: FutureWarning: DataFrame.applymap has been deprecated. Use DataFrame.map instead.

source: pandas_dataframe_map_applymap.py

The following example uses map(), but applymap() has the same usage and functionality. In versions before 2.1.0, use applymap().

As with map() of Series, the na_action argument can be specified for map() of DataFrame. By default, missing values (NaN) are passed to the function, but if na_action is set to 'ignore', NaN is not passed to the function and the result remains as NaN.

df_nan = pd.DataFrame([[1, float('nan'), 100], [2, 20, 200]])
print(df_nan)
#    0     1    2
# 0  1   NaN  100
# 1  2  20.0  200

# print(df_nan.map(lambda x: hex(int(x))))
# ValueError: cannot convert float NaN to integer

print(df_nan.map(lambda x: hex(int(x)), na_action='ignore'))
#      0     1     2
# 0  0x1   NaN  0x64
# 1  0x2  0x14  0xc8

source: pandas_dataframe_map_applymap.py

Unlike map() of Series, map() of DataFrame passes the specified keyword argument to the function.

df = pd.DataFrame([['1', 'A', 'F'], ['11', 'AA', 'FF']])
print(df)
#     0   1   2
# 0   1   A   F
# 1  11  AA  FF

print(df.map(int, base=16))
#     0    1    2
# 0   1   10   15
# 1  17  170  255

source: pandas_dataframe_map_applymap.py

As of version 2.1.2, map() of DataFrame does not have the args argument, which means you cannot specify positional arguments.

Apply functions to rows and columns in `DataFrame`: `apply()`

To apply a function to rows or columns in a DataFrame, use the apply() method.

pandas.DataFrame.apply — pandas 2.1.3 documentation

For the agg() method applying multiple operations at once, see the following article.

pandas: Aggregate data with agg(), aggregate()

Basic usage

Specify the function you want to apply as the first argument.

Note that the built-in sum() function is used for explanation purposes, but if you need to calculate a sum, it is better to use the sum() method mentioned later.

df = pd.DataFrame([[10, 20, 30], [40, 50, 60]], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#     A   B   C
# X  10  20  30
# Y  40  50  60

print(df.apply(sum))
# A    50
# B    70
# C    90
# dtype: int64

source: pandas_dataframe_apply.py

By default, each column is passed to the function as a Series. If the function cannot accept a Series as an argument, an error will occur.

print(df.apply(lambda x: type(x)))
# A    <class 'pandas.core.series.Series'>
# B    <class 'pandas.core.series.Series'>
# C    <class 'pandas.core.series.Series'>
# dtype: object

# print(hex(df['A']))
# TypeError: 'Series' object cannot be interpreted as an integer

# print(df.apply(hex))
# TypeError: 'Series' object cannot be interpreted as an integer

source: pandas_dataframe_apply.py

Specify rows or columns: `axis`

By default, the function is applied to each column. However, setting the axis argument to 1 or 'columns' applies it to each row.

df = pd.DataFrame([[10, 20, 30], [40, 50, 60]], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#     A   B   C
# X  10  20  30
# Y  40  50  60

print(df.apply(sum, axis=1))
# X     60
# Y    150
# dtype: int64

source: pandas_dataframe_apply.py

Specify arguments for the function: Keyword arguments, `args`

Any keyword arguments specified in apply() are passed to the function being applied. You can also specify positional arguments using the args argument.

df = pd.DataFrame([[10, 20, 30], [40, 50, 60]], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#     A   B   C
# X  10  20  30
# Y  40  50  60

def my_func(x, y, z):
    return sum(x) + y + z * 2

print(df.apply(my_func, y=100, z=1000))
# A    2150
# B    2170
# C    2190
# dtype: int64

print(df.apply(my_func, args=(100, 1000)))
# A    2150
# B    2170
# C    2190
# dtype: int64

source: pandas_dataframe_apply.py

Pass as `ndarray` instead of `Series`: `raw`

By default, each row or column is passed as a Series. If you set the raw argument to True, they are passed as NumPy arrays (ndarray).

df = pd.DataFrame([[10, 20, 30], [40, 50, 60]], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#     A   B   C
# X  10  20  30
# Y  40  50  60

print(df.apply(lambda x: type(x), raw=True))
# A    <class 'numpy.ndarray'>
# B    <class 'numpy.ndarray'>
# C    <class 'numpy.ndarray'>
# dtype: object

source: pandas_dataframe_apply.py

If there's no need for a Series, using raw=True is faster since the conversion process is omitted. However, if the function requires Series methods or attributes, setting raw=True will raise an error.

print(df.apply(lambda x: x.name * 3))
# A    AAA
# B    BBB
# C    CCC
# dtype: object

# print(df.apply(lambda x: x.name * 3, raw=True))
# AttributeError: 'numpy.ndarray' object has no attribute 'name'

source: pandas_dataframe_apply.py

Apply functions to specific rows or columns

To apply a function to a specific row or column, extract the row or column as a Series and use the map() or apply() methods of Series.

pandas: Select rows/columns by index (numbers and names)

df = pd.DataFrame([[10, 20, 30], [40, 50, 60]], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#     A   B   C
# X  10  20  30
# Y  40  50  60

print(df['A'].map(lambda x: x**2))
# X     100
# Y    1600
# Name: A, dtype: int64

print(df.loc['Y'].map(hex))
# A    0x28
# B    0x32
# C    0x3c
# Name: Y, dtype: object

source: pandas_dataframe_apply.py

You can add them as new rows or columns. If the same row or column names are specified, they will be overwritten.

pandas: Add rows/columns to DataFrame with assign(), insert()

df['A'] = df['A'].map(lambda x: x**2)
df.loc['Y_hex'] = df.loc['Y'].map(hex)
print(df)
#            A     B     C
# X        100    20    30
# Y       1600    50    60
# Y_hex  0x640  0x32  0x3c

source: pandas_dataframe_apply.py

Use methods of `DataFrame` and `Series`, and arithmetic Operators

In pandas, common operations are provided as methods for DataFrame and Series, so there's no need to use map() or apply().

df = pd.DataFrame([[1, -2, 3], [-4, 5, -6]], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#    A  B  C
# X  1 -2  3
# Y -4  5 -6

print(df.abs())
#    A  B  C
# X  1  2  3
# Y  4  5  6

print(df.sum())
# A   -3
# B    3
# C   -3
# dtype: int64

print(df.sum(axis=1))
# X    2
# Y   -5
# dtype: int64

source: pandas_numpy_function.py

For a list of available methods, refer to the official documentation.

You can also process DataFrame and Series directly using arithmetic operators.

print(df * 10)
#     A   B   C
# X  10 -20  30
# Y -40  50 -60

print(df['A'].abs() + df['B'] * 100)
# X   -199
# Y    504
# dtype: int64

source: pandas_numpy_function.py

Methods for string manipulation are also available through the str accessor of Series.

pandas: Handle strings (replace, strip, case conversion, etc.)

df = pd.DataFrame([['a', 'ab', 'abc'], ['x', 'xy', 'xyz']], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#    A   B    C
# X  a  ab  abc
# Y  x  xy  xyz

print(df['A'] + '-' + df['B'].str.upper() + '-' + df['C'].str.title())
# X    a-AB-Abc
# Y    x-XY-Xyz
# dtype: object

source: pandas_numpy_function.py

Use NumPy functions

You can process DataFrame and Series by passing them to NumPy functions.

For example, although pandas does not provide a method for truncating decimals, you can use np.floor() instead. For DataFrame, a DataFrame is returned; for Series, a Series is returned.

NumPy: Round up/down array elements (np.floor, np.trunc, np.ceil)

df = pd.DataFrame([[0.1, 0.5, 0.9], [-0.1, -0.5, -0.9]], index=['X', 'Y'], columns=['A', 'B', 'C'])
print(df)
#      A    B    C
# X  0.1  0.5  0.9
# Y -0.1 -0.5 -0.9

print(np.floor(df))
#      A    B    C
# X  0.0  0.0  0.0
# Y -1.0 -1.0 -1.0

print(type(np.floor(df)))
# <class 'pandas.core.frame.DataFrame'>

print(np.floor(df['A']))
# X    0.0
# Y   -1.0
# Name: A, dtype: float64

print(type(np.floor(df['A'])))
# <class 'pandas.core.series.Series'>

source: pandas_numpy_function.py

It is also possible to specify the axis argument in the NumPy function.

print(np.sum(df, axis=0))
# A    0.0
# B    0.0
# C    0.0
# dtype: float64

print(np.sum(df, axis=1))
# X    1.5
# Y   -1.5
# dtype: float64

print(type(np.sum(df, axis=0)))
# <class 'pandas.core.series.Series'>

source: pandas_numpy_function.py

Speed comparison

Compare the processing speeds of the map() and apply() methods of DataFrame with other dedicated methods and NumPy functions.

Consider a DataFrame with 100 rows and 100 columns.

df = pd.DataFrame(np.arange(-5000, 5000).reshape(100, 100))

print(df.shape)
# (100, 100)

source: pandas_map_apply_timeit.py

Note that the following examples use the %%timeit magic command in Jupyter Notebook. They won't work if executed as a Python script.

Measure execution time with timeit in Python

The results for using the built-in abs() function with map(), compared to using the abs() method of DataFrame and the np.abs() function, are as follows. It can be observed that map() is slower.

%%timeit
df.map(abs)
# 2.07 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
df.abs()
# 5.06 µs ± 55 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

%%timeit
np.abs(df)
# 7.81 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

source: pandas_map_apply_timeit.py

The results for using the built-in sum() function with apply(), compared to using the sum() method of DataFrame and the np.sum() function, are as follows. It can be seen that apply() is slower. Although setting raw=True does speed it up, it is still significantly slower than sum() of DataFrame or np.sum().

%%timeit
df.apply(sum)
# 932 µs ± 95.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit
df.apply(sum, raw=True)
# 427 µs ± 4.8 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

%%timeit
df.sum()
# 35 µs ± 140 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

%%timeit
np.sum(df, axis=0)
# 37.3 µs ± 66.9 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

source: pandas_map_apply_timeit.py

The map() and apply() methods should be used primarily for complex operations that cannot be achieved with other methods or NumPy functions. If possible, it is better to use other methods or NumPy functions.

pandas: Apply functions to values, rows, columns with map(), apply()

Apply functions to values in `Series`: `map()`, `apply()`

How to use `map()`

How to use `apply()`

Apply functions to values in `DataFrame`: `map()`, `applymap()`

Apply functions to rows and columns in `DataFrame`: `apply()`

Basic usage

Specify rows or columns: `axis`

Specify arguments for the function: Keyword arguments, `args`

Pass as `ndarray` instead of `Series`: `raw`

Apply functions to specific rows or columns

Use methods of `DataFrame` and `Series`, and arithmetic Operators

Use NumPy functions

Speed comparison

Related Categories

Related Articles

pandas: Apply functions to values, rows, columns with map(), apply()

Apply functions to values in Series: map(), apply()

How to use map()

How to use apply()

Apply functions to values in DataFrame: map(), applymap()

Apply functions to rows and columns in DataFrame: apply()

Basic usage

Specify rows or columns: axis

Specify arguments for the function: Keyword arguments, args

Pass as ndarray instead of Series: raw

Apply functions to specific rows or columns

Use methods of DataFrame and Series, and arithmetic Operators

Use NumPy functions

Speed comparison

Related Categories

Related Articles

Apply functions to values in `Series`: `map()`, `apply()`

How to use `map()`

How to use `apply()`

Apply functions to values in `DataFrame`: `map()`, `applymap()`

Apply functions to rows and columns in `DataFrame`: `apply()`

Specify rows or columns: `axis`

Specify arguments for the function: Keyword arguments, `args`

Pass as `ndarray` instead of `Series`: `raw`

Use methods of `DataFrame` and `Series`, and arithmetic Operators