0

I'm trying to calculate a moving sum in Python based off days, however I don't have every date listed in my data frame. In the example below the cum sum should account for the fact that the 2021-01-03 date is missing from the data frame

import pandas as pd

df = pd.DataFrame({'date':['2021-01-01', '2021-01-02', '2021-01-04', '2021-01-05'],
                   'value':[1, 1, 1, 1]})

df_correct = pd.DataFrame({'date':['2021-01-01', '2021-01-02', '2021-01-04', '2021-01-05'],
                           'value':[1, 1, 1, 1],
                           '2_day_cum_sum':[1, 2, 1, 2]})

print(df_correct)

I know how to calculate a rolling sum using the rolling function from pandas, but I don't know how to account for the missing date. My current vision is to create a data frame that has every combination of date and other variables (not in this basic example), and then merge the active data set to that, so I can effectively use the rolling function from pandas. However, I feel like there is a better approach that exists. Any thoughts?

mf17
  • 91
  • 6
  • Perhaps you could interpolate the values? Not sure how that affects your results though. – Jeppe Sep 16 '21 at 05:49

1 Answers1

1

I know you mentioned about the rolling function, not sure if you know that you could specify a date window in the rolling function.. Let's try:

# ensure date is in right format
df['date'] = pd.to_datetime(df.date)

# set date as index
df = df.set_index('date')

# rolling sum with 2 days as window
df['2_day_cum_sum'] = df['value'].rolling('2d').sum()

# reset index
df = df.reset_index()

I'm able to get your expected output and I think this suffices for your use, unless you'd like to input a different value for when the date is missing - e.g. set value as 5 if date is missing. If that is so then we'll need a different solution

kelvt
  • 949
  • 6
  • 16