85

This seems like it would be fairly straight forward but after nearly an entire day I have not found the solution. I've loaded my dataframe with read_csv and easily parsed, combined and indexed a date and a time column into one column but now I want to be able to just reshape and perform calculations based on hour and minute groupings similar to what you can do in excel pivot.

I know how to resample to hour or minute but it maintains the date portion associated with each hour/minute whereas I want to aggregate the data set ONLY to hour and minute similar to grouping in excel pivots and selecting "hour" and "minute" but not selecting anything else.

Any help would be greatly appreciated.

horatio1701d
  • 8,809
  • 14
  • 48
  • 77
  • Would it help to get a `time` object from each `datetime` one you have? You could create a `pandas.Series` object from your `dataframe.index` and then assign it to the index (replacing the current one). Could you "print" some rows of your dataframe? – heltonbiker Apr 28 '13 at 18:18
  • 1
    Thank you. I'm not familiar with using time object to get the time from the datetime column if that's what you mean. I just figured out one way that is extremely close to what I need using the following code for hourly and minutely respectively but is there an easier way to do it, especially a way to have hourly and minute together?: hourly = ims_havas.groupby(ims_havas.index.hour).sum() – horatio1701d Apr 28 '13 at 18:34

5 Answers5

77

Can't you do, where df is your DataFrame:

times = pd.to_datetime(df.timestamp_col)
df.groupby([times.dt.hour, times.dt.minute]).value_col.sum()
redspidermkv
  • 503
  • 10
  • 25
Wes McKinney
  • 101,437
  • 32
  • 142
  • 108
  • 1
    Yes that works perfectly for me too but I have follow up question: how can I use this "grouped time series" as my x-axis in a matlibplot ? – 2705114-john Mar 12 '14 at 21:54
  • 27
    I had to do with `df.groupby([times.dt.hour, times.dt.minute]) ... ` – akilat90 Oct 26 '17 at 05:29
  • 10
    Does this work in Python 3? The pd.to_datetime function appears to create a pandas.core.series.Series object, but without any datetime features. I get "AttributeError: 'Series' object has no attribute 'hour'". – Adrian Keister Sep 10 '18 at 17:18
  • 7
    @AdrianKeister it works, you just have to put the prefix dt. In this specific case it would go like `times.dt.hour`, like @akilat90 stated. – Roughmar Jun 04 '19 at 10:42
53

Wes' code didn't work for me. But the DatetimeIndex function (docs) did:

times = pd.DatetimeIndex(data.datetime_col)
grouped = df.groupby([times.hour, times.minute])

The DatetimeIndex object is a representation of times in pandas. The first line creates a array of the datetimes. The second line uses this array to get the hour and minute data for all of the rows, allowing the data to be grouped (docs) by these values.

Nix G-D
  • 757
  • 6
  • 8
21

Came across this when I was searching for this type of groupby. Wes' code above didn't work for me, not sure if it's because changes in pandas over time.

In pandas 0.16.2, what I did in the end was:

grp = data.groupby(by=[data.datetime_col.map(lambda x : (x.hour, x.minute))])
grp.count()

You'd have (hour, minute) tuples as the grouped index. If you want multi-index:

grp = data.groupby(by=[data.datetime_col.map(lambda x : x.hour),
                       data.datetime_col.map(lambda x : x.minute)])
WillZ
  • 3,775
  • 5
  • 30
  • 38
7

I have an alternative of Wes & Nix answers above, with just one line of code, assuming your column is already a datetime column, you don't need to get the hour and minute attributes separately:

df.groupby(df.timestamp_col.dt.time).value_col.sum()
tsando
  • 4,557
  • 2
  • 33
  • 35
2

This might be a little late but I found quite a good solution for any one that has the same problem. I have a df like this:

datetime              value
2022-06-28 13:28:08   15
2022-06-28 13:28:09   30
...                   ...
2022-06-28 14:29:11   20
2022-06-28 14:29:12   10

I want to convert those timestamps which are in intervals of a second to timestamps with an interval of minutes adding the value column in the process. There is a neat way of doing it:

df['datetime'] = pd.to_datetime(df['datetime']) #if not already as datetime object
grouped = df.groupby(pd.Grouper(key='datetime', axis=0, freq='T')).sum()
print(grouped.head())

Result:

datetime              value
2022-06-28 13:28:00   45
...                   ...
2022-06-28 14:29:00   30

freq='T' stands for minutes. You could also group it by hours or days. They are called Offset aliases.

  • This is great for aggregating by e.g. 5 minute or 15 minute interval, but I think the OP (myself as well) was looking for a way to count by time interval without the date, so that for instance during a 30 day month count all entries which occurred between 8:00 and 8:14, and all entries which occurred between 8:15 and 8:29, regardless of the days on which they occurred. I've tried several variations myself but cannot find a way to both resample to 15T and also ignore the date portion of datetime. I can do one or the other, but not both. – Don 'FreihEitner' Eitner Dec 24 '22 at 23:46
  • Disregard my last post. My problem has been using .resample() to get times in 15 minute increments but not recognizing this was also aggregating rows. I'm now using a lambda function on each row found and math.floor(dt.minute/15)*15 to extract the 15 minute interval for each row, then working with a column of only those time values. – Don 'FreihEitner' Eitner Dec 27 '22 at 02:38