pandas: Cumulative calculations (cumsum, cumprod, cummax, cummin)
In pandas, you can calculate cumulative sum and product using the cumsum()
and cumprod()
methods for pandas.DataFrame
and Series
.
- pandas.DataFrame.cumsum — pandas 1.5.3 documentation
- pandas.DataFrame.cumprod — pandas 1.5.3 documentation
Additionally, the cummax()
and cummin()
methods are available for calculating cumulative maximum and minimum.
- pandas.DataFrame.cummax — pandas 1.5.3 documentation
- pandas.DataFrame.cummin — pandas 1.5.3 documentation
This article covers the following topics:
- Cumulative sum and product:
cumsum()
,cumprod()
- Basic usage
- Handling missing values (
NaN
):skipna
- Cumulative maximum and minimum:
cummax()
,cummin()
You can also use Python's standard library itertools and NumPy functions/methods to calculate cumulative sum and product. With itertools, you can apply any function cumulatively.
- Calculate cumulative sum and product in Python (itertools.accumulate)
- NumPy: Calculate cumulative sum and product (np.cumsum, np.cumprod)
Cumulative sum and product: cumsum()
, cumprod()
Basic usage
Consider the following pandas.DataFrame
as an example:
import pandas as pd
print(pd.__version__)
# 1.0.5
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['X', 'Y', 'Z'])
print(df)
# A B
# X 1 4
# Y 2 5
# Z 3 6
By default, cumsum()
and cumprod()
calculate cumulative sum and product column-wise. To perform calculations row-wise, set the axis
argument to 1
.
print(df.cumsum())
# A B
# X 1 4
# Y 3 9
# Z 6 15
print(df.cumsum(axis=1))
# A B
# X 1 5
# Y 2 7
# Z 3 9
print(df.cumprod())
# A B
# X 1 4
# Y 2 20
# Z 6 120
print(df.cumprod(axis=1))
# A B
# X 1 4
# Y 2 10
# Z 3 18
pandas.Series
also provides cumsum()
and cumprod()
methods.
print(df['B'])
# X 4
# Y 5
# Z 6
# Name: B, dtype: int64
print(type(df['B']))
# <class 'pandas.core.series.Series'>
print(df['B'].cumsum())
# X 4
# Y 9
# Z 15
# Name: B, dtype: int64
print(df['B'].cumprod())
# X 4
# Y 20
# Z 120
# Name: B, dtype: int64
Handling missing values (NaN
): skipna
Consider a pandas.DataFrame
containing missing values (NaN
):
df_nan = pd.DataFrame({'A': [1, 2, 3], 'B': [4, float('nan'), 6]}, index=['X', 'Y', 'Z'])
print(df_nan)
# A B
# X 1 4.0
# Y 2 NaN
# Z 3 6.0
By default, missing values (NaN
) are skipped.
print(df_nan.cumsum())
# A B
# X 1 4.0
# Y 3 NaN
# Z 6 10.0
If you set the skipna
argument to False
, missing values (NaN
) are also processed. Since arithmetic operations with NaN
result in NaN
, all elements following a NaN
become NaN
.
print(float('nan') + 4)
# nan
print(df_nan.cumsum(skipna=False))
# A B
# X 1 4.0
# Y 3 NaN
# Z 6 NaN
This applies to cumprod()
as well.
print(df_nan.cumprod())
# A B
# X 1 4.0
# Y 2 NaN
# Z 6 24.0
print(df_nan.cumprod(skipna=False))
# A B
# X 1 4.0
# Y 2 NaN
# Z 6 NaN
The same behavior is observed for pandas.Series
, but examples are not shown here for brevity.
Cumulative maximum and minimum: cummax()
, cummin()
There are also cummax()
and cummin()
methods for calculating cumulative maximum and minimum values. These are useful, for example, when calculating the maximum or minimum value up to a certain point in time series data.
Usage is the same as for cumsum()
and cumprod()
. Calculations are performed column-wise by default and row-wise if axis=1
.
df2 = pd.DataFrame({'A': [1, 4, 2], 'B': [6, 3, 5]}, index=['X', 'Y', 'Z'])
print(df2)
# A B
# X 1 6
# Y 4 3
# Z 2 5
print(df2.cummax())
# A B
# X 1 6
# Y 4 6
# Z 4 6
print(df2.cummax(axis=1))
# A B
# X 1 6
# Y 4 4
# Z 2 5
print(df2.cummin())
# A B
# X 1 6
# Y 1 3
# Z 1 3
print(df2.cummin(axis=1))
# A B
# X 1 1
# Y 4 3
# Z 2 2
Handling of missing values (NaN
) is also the same as for cumsum()
and cumprod()
, and the skipna
argument can be specified.
df2_nan = pd.DataFrame({'A': [1, 4, 2], 'B': [6, float('nan'), 5]}, index=['X', 'Y', 'Z'])
print(df2_nan)
# A B
# X 1 6.0
# Y 4 NaN
# Z 2 5.0
print(df2_nan.cummax())
# A B
# X 1 6.0
# Y 4 NaN
# Z 4 6.0
print(df2_nan.cummax(skipna=False))
# A B
# X 1 6.0
# Y 4 NaN
# Z 4 NaN
print(df2_nan.cummin())
# A B
# X 1 6.0
# Y 1 NaN
# Z 1 5.0
print(df2_nan.cummin(skipna=False))
# A B
# X 1 6.0
# Y 1 NaN
# Z 1 NaN
pandas.Series
also supports cummax()
and cummin()
methods, but examples are omitted for brevity.