How to reset a Pandas DataFrame rolling average on signal?

Question

I am trying to reset a rolling average based on a signal condition. From the point where the condition is true onwards, the previous values should be ignored. This can be done by replacing the history of prior values by the current value at the signal.

idx = pd.date_range(start='2000-01-01',end='2000-01-11')
#df = pd.DataFrame(np.random.normal(size=(len(idx),1)).cumsum(), index=idx, columns=['value'])
df = pd.DataFrame(range(0, 11), index=idx, columns=['value'])
df['signal'] = [0] * 5 + [1] + [0] * 5
df['ma'] = df.value.rolling(3).mean()
df['value2'] = [5] * 6 + list(range(6, 11))
df['ma2'] = df.value2.rolling(3).mean()

            value  signal   ma  value2       ma2
2000-01-01      0       0  NaN       5       NaN
2000-01-02      1       0  NaN       5       NaN
2000-01-03      2       0  1.0       5  5.000000
2000-01-04      3       0  2.0       5  5.000000
2000-01-05      4       0  3.0       5  5.000000
2000-01-06      5       1  4.0       5  5.000000
2000-01-07      6       0  5.0       6  5.333333
2000-01-08      7       0  6.0       7  6.000000
2000-01-09      8       0  7.0       8  7.000000
2000-01-10      9       0  8.0       9  8.000000
2000-01-11     10       0  9.0      10  9.000000

In this example value2 is the value the moving average should be calculated on once the signal fires, and ma2 would be the expected result from index 2000-01-06 onward. Up to 2000-01-05 it should keep the original 'ma' values. (values predating the signal should not be affected)

I found a similar request here pandas rolling average with a rolling mask / excluding entries that may be useful but I can't figure out how to apply it to my need.

Do you want something like this `df['ma2'].where(df['signal'].cummax().astype(bool), df['ma'])` — Scott Boston, Aug 01 '21 at 16:21

score 0 · Answer 1 · answered Aug 01 '21 at 16:01

effectively you want to apply() for each groupby()
groupby() uses the signal cumsum() to generate two groups in your example. This assumes signal zero is significant
there is different between your sample incomplete out put and way code works. rolling(4) will generate 3 NaN for each rolling mean calc

df.loc[:,["ma","ma2"]] = df.groupby(df["signal"].cumsum()).apply(lambda d: d.loc[:,["value","value2"]].rolling(4).mean()).values
df

	value	signal	ma	value2	ma2
2000-01-01 00:00:00	0	0	nan	0	nan
2000-01-02 00:00:00	1	0	nan	5	nan
2000-01-03 00:00:00	2	0	nan	5	nan
2000-01-04 00:00:00	3	0	1.5	5	3.75
2000-01-05 00:00:00	4	0	2.5	5	5
2000-01-06 00:00:00	5	1	nan	5	nan
2000-01-07 00:00:00	6	0	nan	6	nan
2000-01-08 00:00:00	7	0	nan	7	nan
2000-01-09 00:00:00	8	0	6.5	8	6.5
2000-01-10 00:00:00	9	0	7.5	9	7.5
2000-01-11 00:00:00	10	0	8.5	10	8.5

thanks, that was quick! Unfortunately your proposal doesn't fulfill the need. NaN's are not allowed, we need an immediate valid value. That is why we backfill from the signal backwards with the current value. The goal is to restart the ma at the current value. Signal zero is not significant, might as well be NaN. — Boaz, Aug 01 '21 at 16:16

score 0 · Answer 2 · answered Aug 01 '21 at 17:15

Rob's answer using df.groupby(df["signal"].cumsum()) is nice. If your problem is the remaining NaNs, that's easily fixed by understanding the pd.rolling function. See the docs here.

Sounds like you want an unlimited window size, which calculates even with one input. So you just need to replace your rolling(4) with:

rolling(df.shape[0], min_periods=1)

where the df.shape[0] is the number of entries in your dataframe, to make sure the rolling window can be as large as the size of the dataframe.

(Posting as an answer because I don't have enough reputation to comment...)

How to reset a Pandas DataFrame rolling average on signal?

2 Answers2