1

I have lovely code that makes a list of averages of all entries on a given timeindex period. For example, on a dataframe with ten years of data, it will return the average values for each day of the week. What I want to do is propagate these average values back to the entire dataframe in as few lines as possible.

I'm reasonably proud of the code to make the averages. For example, take a dataframe that has this data (I am listing "day of week" for your convenience since this value is implicit in the datetimeindex and this column does not exist in my own data):

Datetimeindex   day of week value
2019-01-01      3            60
2019-01-02      4            46
2019-01-03      5            72
2019-01-04      6            16
2019-01-05      7            94
2019-01-06      1            100
2019-01-07      2            41
2019-01-08      3            63
2019-01-09      4            3
2019-01-10      5            52
2019-01-11      6            38
2019-01-12      7            43
2019-01-13      1            22
2019-01-14      2            73
2019-01-15      3            72
2019-01-16      4            30
2019-01-17      5            52
2019-01-18      6            25
2019-01-19      7            28
2019-01-20      1            83
2019-01-21      2            98
2019-01-22      3            9
2019-01-23      4            61
2019-01-24      5            88
2019-01-25      6            54
2019-01-26      7            32
2019-01-27      1            32
2019-01-28      2            41
2019-01-29      3            65
2019-01-30      4            24
2019-01-31      5            92
2019-02-01      6            76
2019-02-02      7            47
2019-02-03      1            80
2019-02-04      2            62

My code goes like this:

result = time_series.groupby(time_series.index.weekday).mean()

This provides me with a len(7) vector of averages for each day of the week that looks like this:

day of week mean
1           63.4
2           63
3           53.8
4           32.8
5           71.2
6           41.8
7           48.8

What I would simply like to do is reverse the formula and paste the averages back into the original source dataframe. Taking the example above, the desired result would be:

Datetimeindex   value
2019-01-01      53.8
2019-01-02      32.8
2019-01-03      71.2
2019-01-04      41.8
2019-01-05      48.8
2019-01-06      63.4
2019-01-07      63
2019-01-08      52.56
2019-01-09      30.16
2019-01-10      71.04
2019-01-11      46.96
2019-01-12      39.76
2019-01-13      63.4
2019-01-14      63
2019-01-15      52.56
2019-01-16      30.16
2019-01-17      71.04
2019-01-18      46.96
2019-01-19      39.76
2019-01-20      63.4
2019-01-21      63
2019-01-22      52.56
2019-01-23      30.16
2019-01-24      71.04
2019-01-25      46.96
2019-01-26      39.76
2019-01-27      63.4
2019-01-28      63
2019-01-29      52.56
2019-01-30      30.16
2019-01-31      71.04
2019-02-01      46.96
2019-02-02      39.76
2019-02-03      63.4
2019-02-04      63

It feels like it should be possible to simply reverse the original formula and say: time_series.groupby(time_series.index.weekday) = result

But that clearly doesn't work. Or even if it was possible to perform the mean() process in place, such as:

result = time_series.groupby(time_series.index.weekday).mean(inplace=True)

or

result.loc[result.groupby(by=val)] = time_series.groupby(by=val).mean().values

but neither work. I'd love to do this in a one-liner and I feel so close, but I simply can't make it work. Do you have any ideas?

DK.
  • 43
  • 1
  • 1
  • 6
  • 2
    just do a transformed mean `time_series.groupby(time_series.index.weekday)['value'].transform('mean')` – anky Sep 01 '19 at 13:12
  • 1
    Thank you @anky_91 can you post your code as an answer so I can select it? – DK. Sep 01 '19 at 14:07

3 Answers3

2

IIUC, this should work

result = time_series.groupby('day of week')['value'].transform('mean')
moys
  • 7,747
  • 2
  • 11
  • 42
  • Hi there, I am having trouble implementing this as 'mean' is a series in a different variable. Should it be replaced with `time_series.groupby(time_series.index.weekday).mean()`? – DK. Sep 01 '19 at 13:32
  • Use this directly. This should give the result you have shown. My assumption here is that the name of your dataframe is 'time_series' & it contains 'day of week ' & 'value' as columns – moys Sep 01 '19 at 13:42
  • Thanks, but I think there may be some misunderstanding - the variable above does not exist. Do you mean that the code should be as follows: `result = time_series.groupby(time_series.index.weekday).mean()` `result = time_series.groupby('day of week')['value'].transform('mean')` Because as written, there is no columns named day of week, value, or mean yet: if the first line of this code is not run, those numbers do not exist. – DK. Sep 01 '19 at 13:47
  • Oh! You don’t have the column day of week. Then I think you can use what @anky_91 has put in( In the comment to your question ). – moys Sep 01 '19 at 13:50
  • Thank you for your clarity, I will ask anky to post their answer so I can select it! – DK. Sep 01 '19 at 14:07
1

calling df to your dataframe:

Datetimeindex   day_of_week value
2019-01-01      3            60
2019-01-02      4            46
2019-01-03      5            72
2019-01-04      6            16
2019-01-05      7            94
2019-01-06      1            100
2019-01-07      2            41
2019-01-08      3            63
2019-01-09      4            3
2019-01-10      5            52
2019-01-11      6            38
2019-01-12      7            43
2019-01-13      1            22
2019-01-14      2            73
2019-01-15      3            72
2019-01-16      4            30
2019-01-17      5            52
2019-01-18      6            25
2019-01-19      7            28
2019-01-20      1            83
2019-01-21      2            98
2019-01-22      3            9
2019-01-23      4            61
2019-01-24      5            88
2019-01-25      6            54
2019-01-26      7            32
2019-01-27      1            32
2019-01-28      2            41
2019-01-29      3            65
2019-01-30      4            24
2019-01-31      5            92
2019-02-01      6            76
2019-02-02      7            47
2019-02-03      1            80
2019-02-04      2            62

Use:

df2=df.replace(df.groupby('day_of_week')['value'].mean())['day_of_week'].to_frame().rename(columns={'day_of_week':'value'})

Output:

print(df2)

               value
Datetimeindex       
2019-01-01      53.8
2019-01-02      32.8
2019-01-03      71.2
2019-01-04      41.8
2019-01-05      48.8
2019-01-06      63.4
2019-01-07      63.0
2019-01-08      53.8
2019-01-09      32.8
2019-01-10      71.2
2019-01-11      41.8
2019-01-12      48.8
2019-01-13      63.4
2019-01-14      63.0
2019-01-15      53.8
2019-01-16      32.8
2019-01-17      71.2
2019-01-18      41.8
2019-01-19      48.8
2019-01-20      63.4
2019-01-21      63.0
2019-01-22      53.8
2019-01-23      32.8
2019-01-24      71.2
2019-01-25      41.8
2019-01-26      48.8
2019-01-27      63.4
2019-01-28      63.0
2019-01-29      53.8
2019-01-30      32.8
2019-01-31      71.2
2019-02-01      41.8
2019-02-02      48.8
2019-02-03      63.4
2019-02-04      63.0
ansev
  • 30,322
  • 5
  • 17
  • 31
1

Use transform in this case with a groupby:

time_series.groupby(time_series.index.weekday)['value'].transform('mean')

2019-01-01    53.8
2019-01-02    32.8
2019-01-03    71.2
2019-01-04    41.8
2019-01-05    48.8
2019-01-06    63.4
2019-01-07    63.0
2019-01-08    53.8
2019-01-09    32.8
2019-01-10    71.2
2019-01-11    41.8
2019-01-12    48.8
2019-01-13    63.4
2019-01-14    63.0
......
.....
anky
  • 74,114
  • 11
  • 41
  • 70