0

Problem

How to create column that resamples 5 minutes in pandas but only till current certain row, so either last 5 minutes from current observation, or 5 minutes interval but if I am at minute 3 to resample last 5 or last 3, point being not to resample by future. DataFrame is ordered by datetime

for example first rows can be disregarded since 5 mins did not accumulate yet:

Desired output

enter image description here


datetime    result  mean
3/1/2022 0:04   1   
3/1/2022 0:05   0   
3/1/2022 0:06   0   
3/1/2022 0:06   1   
3/1/2022 0:07   1   
3/1/2022 0:08   1   
3/1/2022 0:10   1   0.666666667
3/1/2022 0:11   1   0.833333333
3/1/2022 0:12   0   0.8
3/1/2022 0:13   0   0.666666667
3/1/2022 0:14   0   0.4
3/1/2022 0:17   0   0
3/1/2022 0:18   1   0.25
3/1/2022 0:19   1   0.5
3/1/2022 0:20   1   0.75
3/1/2022 0:21   1   0.8
3/1/2022 0:22   1   0.833333333
3/1/2022 0:23   1   1
romrom123
  • 15
  • 6
  • you need to give more details, such as a sample dataset (as text!) and the matching expected output – mozway Apr 03 '22 at 14:39
  • sure here it is (will update original question), basically want to get avarage of last 5 minutes that came before that row (number of rows can change) DateTime result resample mean last 5 min 3/1/2022 0:04 1 3/1/2022 0:05 0 3/1/2022 0:06 0 3/1/2022 0:06 1 3/1/2022 0:07 1 3/1/2022 0:08 1 3/1/2022 0:10 1 0.666666667 3/1/2022 0:11 1 0.833333333 3/1/2022 0:12 0 0.8 3/1/2022 0:13 0 0.666666667 3/1/2022 0:14 0 0.4 3/1/2022 0:17 0 0 3/1/2022 0:18 1 0.25 3/1/2022 0:19 1 0.5 3/1/2022 0:20 1 0.75 3/1/2022 0:21 1 0.8 3/1/2022 0:22 1 0.833333333 3/1/2022 0:23 1 1 – romrom123 Apr 03 '22 at 19:00

1 Answers1

1

You need to use rolling with a closed interval on both bounds.

# ensure datetime type
df['datetime'] = pd.to_datetime(df['datetime'])

# compute the rolling mean
df['resample'] = df.rolling('5min', closed='both', on='datetime')['result'].mean()

If you want NaNs when the interval is less than 5 minutes, you can mask the values:

df['resample2'] = (df.rolling('5min', closed='both', on='datetime')['result'].mean()
                     ​.where(df['datetime'].sub(df['datetime'].iloc[0]).gt('5min'))
                   )

Output:

              datetime  result  resample  resample2
0  2022-03-01 00:04:00       1  1.000000        NaN
1  2022-03-01 00:05:00       0  0.500000        NaN
2  2022-03-01 00:06:00       0  0.333333        NaN
3  2022-03-01 00:06:00       1  0.500000        NaN
4  2022-03-01 00:07:00       1  0.600000        NaN
5  2022-03-01 00:08:00       1  0.666667        NaN
6  2022-03-01 00:10:00       1  0.666667   0.666667
7  2022-03-01 00:11:00       1  0.833333   0.833333
8  2022-03-01 00:12:00       0  0.800000   0.800000
9  2022-03-01 00:13:00       0  0.600000   0.600000
10 2022-03-01 00:14:00       0  0.400000   0.400000
11 2022-03-01 00:17:00       0  0.000000   0.000000
12 2022-03-01 00:18:00       1  0.250000   0.250000
13 2022-03-01 00:19:00       1  0.500000   0.500000
14 2022-03-01 00:20:00       1  0.750000   0.750000
15 2022-03-01 00:21:00       1  0.800000   0.800000
16 2022-03-01 00:22:00       1  0.833333   0.833333
17 2022-03-01 00:23:00       1  1.000000   1.000000
mozway
  • 194,879
  • 13
  • 39
  • 75