2

I've looked through the forums and can't seem to figure this out. I have the following data. I assume the answer lies in the "groupby" function but I can't seem to work it out.

Date     Hour    Value   3DAverage
1/1       1       57      53.33
1/1       2       43      42.33
1/1       3       44      45.33
1/2       1       51      ...
1/2       2       40      ...
1/2       3       42      ...
1/3       1       56      ...
1/3       2       42
1/3       3       48
1/4       1       53
1/4       2       45
1/4       3       46
1/5       1       56
1/5       2       46
1/5       3       48
1/5       4       64 *       
1/6       1       50
1/6       2       41
1/6       3       42
1/7       1       57
1/7       2       43
1/7       3       45
1/8       1       58
1/8       2       49
1/8       3       41
1/9       1       53
1/9       2       46
1/9       3       47
1/10      1       58
1/10      2       49
1/10      3       40

What I am trying to do is add the "3DAverage" column. I would like this column to produce an average of the "Value" column for the PRIOR 3 corresponding hour values. I want to fill this column down for the entire series. For example, the value 53.33 is an average of the value for hour 1 on 1/2, 1/3, and 1/4. I would like this to continue down the column using only the prior 3 values for each "HourValue".

Also, please note that there are instances such as 1/5 hour 4. Not all dates have the same number of hours, so I am looking for the last 3 hour values for dates in which those hours exist.

I hope that makes sense. Thanks so much in advance for your help !

Frank Drin
  • 1,613
  • 2
  • 13
  • 18

2 Answers2

2

You can try rolling mean

df['3D Average'] = df.iloc[::-1].groupby('Hour').Value.rolling(window = 3).mean()\
.shift().sort_index(level = 1).values
Vaishali
  • 37,545
  • 5
  • 58
  • 86
  • Awesome, this is so sweeeeeeeet. Thanks ! – Frank Drin Mar 22 '18 at 20:36
  • Hi Vaishali, to confirm, this will still work if some dates are missing hours correct ? It is not dependent on a 24 hour sequence for each date I presume. Also, I am trying to loop this for multiple Value columns, but I cant use the constituents of a dictionary in the Value, it breaks. – Frank Drin Mar 23 '18 at 20:44
1

You can groupby on Date column and do the following:

df['3DAverage'] = df['Hour'].map(df.groupby('Hour').apply(lambda x: x.loc[x['Date'].isin(['1/2','1/3','1/4']),'Value'].mean()))

df.head(6)

   Date   Hour Value 3DAverage
0   1/1     1   57   53.333333
1   1/1     2   43   42.333333
2   1/1     3   44   45.333333
3   1/2     1   51   53.333333
4   1/2     2   40   42.333333
5   1/2     3   42   45.333333
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • Hi Manish. Thanks, but I see that my question was unclear. I want to fill this column down for the entire series, so that I always have the "3 day trailing average" for each row. Thank you for your help. I have edited the original question. – Frank Drin Mar 22 '18 at 18:46