pandas: Cumulative calculations (cumsum, cumprod, cummax, cummin)

Posted: | Tags: Python, pandas

In pandas, you can calculate cumulative sum and product using the cumsum() and cumprod() methods for pandas.DataFrame and Series.

Additionally, the cummax() and cummin() methods are available for calculating cumulative maximum and minimum.

This article covers the following topics:

  • Cumulative sum and product: cumsum(), cumprod()
    • Basic usage
    • Handling missing values (NaN): skipna
  • Cumulative maximum and minimum: cummax(), cummin()

You can also use Python's standard library itertools and NumPy functions/methods to calculate cumulative sum and product. With itertools, you can apply any function cumulatively.

Cumulative sum and product: cumsum(), cumprod()

Basic usage

Consider the following pandas.DataFrame as an example:

import pandas as pd

print(pd.__version__)
# 1.0.5

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['X', 'Y', 'Z'])
print(df)
#    A  B
# X  1  4
# Y  2  5
# Z  3  6

By default, cumsum() and cumprod() calculate cumulative sum and product column-wise. To perform calculations row-wise, set the axis argument to 1.

print(df.cumsum())
#    A   B
# X  1   4
# Y  3   9
# Z  6  15

print(df.cumsum(axis=1))
#    A  B
# X  1  5
# Y  2  7
# Z  3  9
print(df.cumprod())
#    A    B
# X  1    4
# Y  2   20
# Z  6  120

print(df.cumprod(axis=1))
#    A   B
# X  1   4
# Y  2  10
# Z  3  18

pandas.Series also provides cumsum() and cumprod() methods.

print(df['B'])
# X    4
# Y    5
# Z    6
# Name: B, dtype: int64

print(type(df['B']))
# <class 'pandas.core.series.Series'>

print(df['B'].cumsum())
# X     4
# Y     9
# Z    15
# Name: B, dtype: int64

print(df['B'].cumprod())
# X      4
# Y     20
# Z    120
# Name: B, dtype: int64

Handling missing values (NaN): skipna

Consider a pandas.DataFrame containing missing values (NaN):

df_nan = pd.DataFrame({'A': [1, 2, 3], 'B': [4, float('nan'), 6]}, index=['X', 'Y', 'Z'])
print(df_nan)
#    A    B
# X  1  4.0
# Y  2  NaN
# Z  3  6.0

By default, missing values (NaN) are skipped.

print(df_nan.cumsum())
#    A     B
# X  1   4.0
# Y  3   NaN
# Z  6  10.0

If you set the skipna argument to False, missing values (NaN) are also processed. Since arithmetic operations with NaN result in NaN, all elements following a NaN become NaN.

print(float('nan') + 4)
# nan

print(df_nan.cumsum(skipna=False))
#    A    B
# X  1  4.0
# Y  3  NaN
# Z  6  NaN

This applies to cumprod() as well.

print(df_nan.cumprod())
#    A     B
# X  1   4.0
# Y  2   NaN
# Z  6  24.0

print(df_nan.cumprod(skipna=False))
#    A    B
# X  1  4.0
# Y  2  NaN
# Z  6  NaN

The same behavior is observed for pandas.Series, but examples are not shown here for brevity.

Cumulative maximum and minimum: cummax(), cummin()

There are also cummax() and cummin() methods for calculating cumulative maximum and minimum values. These are useful, for example, when calculating the maximum or minimum value up to a certain point in time series data.

Usage is the same as for cumsum() and cumprod(). Calculations are performed column-wise by default and row-wise if axis=1.

df2 = pd.DataFrame({'A': [1, 4, 2], 'B': [6, 3, 5]}, index=['X', 'Y', 'Z'])
print(df2)
#    A  B
# X  1  6
# Y  4  3
# Z  2  5

print(df2.cummax())
#    A  B
# X  1  6
# Y  4  6
# Z  4  6

print(df2.cummax(axis=1))
#    A  B
# X  1  6
# Y  4  4
# Z  2  5

print(df2.cummin())
#    A  B
# X  1  6
# Y  1  3
# Z  1  3

print(df2.cummin(axis=1))
#    A  B
# X  1  1
# Y  4  3
# Z  2  2

Handling of missing values (NaN) is also the same as for cumsum() and cumprod(), and the skipna argument can be specified.

df2_nan = pd.DataFrame({'A': [1, 4, 2], 'B': [6, float('nan'), 5]}, index=['X', 'Y', 'Z'])
print(df2_nan)
#    A    B
# X  1  6.0
# Y  4  NaN
# Z  2  5.0

print(df2_nan.cummax())
#    A    B
# X  1  6.0
# Y  4  NaN
# Z  4  6.0

print(df2_nan.cummax(skipna=False))
#    A    B
# X  1  6.0
# Y  4  NaN
# Z  4  NaN

print(df2_nan.cummin())
#    A    B
# X  1  6.0
# Y  1  NaN
# Z  1  5.0

print(df2_nan.cummin(skipna=False))
#    A    B
# X  1  6.0
# Y  1  NaN
# Z  1  NaN

pandas.Series also supports cummax() and cummin() methods, but examples are omitted for brevity.

Related Categories

Related Articles