Getting the average of a certain hour on weekdays over several years in a pandas dataframe

Question

I have an hourly dataframe in the following format over several years:

Date/Time            Value
01.03.2010 00:00:00  60
01.03.2010 01:00:00  50
01.03.2010 02:00:00  52
01.03.2010 03:00:00  49
.
.
.
31.12.2013 23:00:00  77

I would like to average the data so I can get the average of hour 0, hour 1... hour 23 of each of the years.

So the output should look somehow like this:

Year Hour           Avg
2010 00              63
2010 01              55
2010 02              50
.
.
.
2013 22              71
2013 23              80

Does anyone know how to obtain this in pandas?

Andy Hayden · Accepted Answer · 2016-06-15T17:30:26.350

23

Note: Now that Series have the dt accessor it's less important that date is the index, though Date/Time still needs to be a datetime64.

Update: You can do the groupby more directly (without the lambda):

In [21]: df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()
Out[21]:
                     Value
Date/Time Date/Time
2010      0             60
          1             50
          2             52
          3             49

In [22]: res = df.groupby([df["Date/Time"].dt.year, df["Date/Time"].dt.hour]).mean()

In [23]: res.index.names = ["year", "hour"]

In [24]: res
Out[24]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

If it's a datetime64 index you can do:

In [31]: df1.groupby([df1.index.year, df1.index.hour]).mean()
Out[31]:
        Value
2010 0     60
     1     50
     2     52
     3     49

Old answer (will be slower):

Assuming Date/Time was the index* you can use a mapping function in the groupby:

In [11]: year_hour_means = df1.groupby(lambda x: (x.year, x.hour)).mean()

In [12]: year_hour_means
Out[12]:
           Value
(2010, 0)     60
(2010, 1)     50
(2010, 2)     52
(2010, 3)     49

For a more useful index, you could then create a MultiIndex from the tuples:

In [13]: year_hour_means.index = pd.MultiIndex.from_tuples(year_hour_means.index,
                                                           names=['year', 'hour'])

In [14]: year_hour_means
Out[14]:
           Value
year hour
2010 0        60
     1        50
     2        52
     3        49

* if not, then first use set_index:

df1 = df.set_index('Date/Time')

edited Jun 15 '16 at 17:30

answered Jun 06 '13 at 16:33

Andy Hayden

359,921
101
625
535

Thanks a lot. I had been trying with loops but this is a much better way. – Markus W Jun 10 '13 at 13:53
P.S.: Does anybody how you can fill the "x.year" or "x.hour" of "df1.groupby(lambda x: (x.year, x.hour)).mean() " as a dynamic parameter into the lamda function? Defining Varialbe1=x.year and Variable2=x.hour for this "df1.groupby(lambda x: (Variable1, Variable2)).mean() " does not seem to work. – Markus W Jun 24 '13 at 09:32
@MarkusW You should ask that as a new question :)... it sounds like you want to use a proper function (i.e. not a lambda) – Andy Hayden Jun 24 '13 at 09:34
@AndyHayden you are a genius. Could you clarify something: does a lambda function always default to using the index? Then given a multiindex, this defaults to a tuple of that multiple index? – Little Bobby Tables Jun 15 '16 at 10:55
1

@josh yes, though you can pass `as_index=False` to override that. In re-reading this question I would do something different. Updated with a much better way to do this (which happens to create the multiindex directly). – Andy Hayden Jun 15 '16 at 17:30
how would I groupy by 10 mins given I have datetimeindex that in 10min interval over multiple days – chaikov Dec 11 '19 at 02:19

score 2 · Answer 2 · answered Dec 08 '14 at 17:32

2

If your date/time column were in the datetime format (see dateutil.parser for automatic parsing options), you can use pandas resample as below:

year_hour_means = df.resample('H',how = 'mean')

which will keep your data in the datetime format. This may help you with whatever you are going to be doing with your data down the line.

answered Dec 08 '14 at 17:32

enmyj

371
4
14

This doesn't average from one day to the next though – endolith Jul 03 '16 at 21:13
@endolith Try daily_average = df.resample('D').mean() where df has datetimeindex – enmyj Jul 05 '16 at 01:27

Getting the average of a certain hour on weekdays over several years in a pandas dataframe

2 Answers2

Linked

Related