2

I have some time series data that spans multiple days, like so:

dr = pd.date_range('01-01-2020 9:00', '01-03-2020 23:59', freq='1T')
df = pd.DataFrame({'data': 1}, index=dr) # all ones in the data column

I am interested in grouping by the time of day and summing the data (i.e., combine data across dates). I have gotten from this post and this one that you group data into the hour of the day by using the time attributes of a datetime series or index, like so:

df.groupby(df.index.hour).sum()
df.groupby(df.index.time).sum()

However, I want to group into 15 minute bins, e.g. something like this (numbers are arbitrary):

      data
00:00   10
00:15   12
00:30   15
...
11:30   16
11:45   20

Note that I do not want to just do a 15 minute resample (e.g. df.resample('15T').sum()), as this doesn't group similar times across days. So for example, data between 9:00 and 9:15 on any date should be placed in the same bin.

I can't find a time attribute that would achieve this. How can I do so?

Tom
  • 8,310
  • 2
  • 16
  • 36
  • dfX = df.groupby(pd.Grouper(freq='15Min')).aggregate(numpy.sum) – Joe Ferndz Jan 21 '21 at 03:56
  • @JoeFerndz that is not putting the same time of day on different dates into the same bin. So like 9:00 on the 1st and 9:00 on the 2nd are placed in separate bins (not what I want). Sorry if this wasn't clear, I updated my post – Tom Jan 21 '21 at 04:16
  • so your input could be 00:01 3, 00:02 4, 00:008 2, 00:14 2, 00:20 5... then you want 00:15 to be (3 + 4 + 2 = 9)? And this has to be on the same day. So 9 pm today should be bin together and not mixed with yesterday's 9 pm – Joe Ferndz Jan 21 '21 at 04:20
  • @JoeFerndz, in your example, the first four times you listed all occur between midnight and 12:15, so they should all be binned, so you should get `00:00 11` (3+4+2+2). But the last part sounds incorrect, *I do want today's 9pm to be mixed with yesterday's 9pm*. I want to look at all timestamps as if they are on the same date, and then resample at 15 minutes, so that I can see what for example an average 9pm looks like across days. Both answers so far seem to do so correctly – Tom Jan 21 '21 at 14:29

2 Answers2

3

You can first take the floor of the timestamps at a certain frequency, and then access their time to group the data:

>>> df.groupby(df.index.floor('15T').time).sum()

          data
00:00:00    30
00:15:00    30
00:30:00    30
00:45:00    30
01:00:00    30
       ...
22:45:00    45
23:00:00    45
23:15:00    45
23:30:00    45
23:45:00    45

[96 rows x 1 columns]

This should work for all other minute frequencies that evenly divide the hour (1, 2, 3, 4, 5, 6, 10, 12, 15, 20, 30, or 60 minutes).

Tom
  • 8,310
  • 2
  • 16
  • 36
2

Try using the following:

df.index = pd.to_datetime(df.index.time, format='%H:%M:%S')
print(df.resample('15T').sum())

Output:

                     data
1900-01-01 00:00:00    31
1900-01-01 00:15:00    30
1900-01-01 00:30:00    30
1900-01-01 00:45:00    30
1900-01-01 01:00:00    30
1900-01-01 01:15:00    30
1900-01-01 01:30:00    30
1900-01-01 01:45:00    30
1900-01-01 02:00:00    30
1900-01-01 02:15:00    30
1900-01-01 02:30:00    30
1900-01-01 02:45:00    30
1900-01-01 03:00:00    30
1900-01-01 03:15:00    30
1900-01-01 03:30:00    30
1900-01-01 03:45:00    30
1900-01-01 04:00:00    30
1900-01-01 04:15:00    30
1900-01-01 04:30:00    30
1900-01-01 04:45:00    30
1900-01-01 05:00:00    30
1900-01-01 05:15:00    30
1900-01-01 05:30:00    30
1900-01-01 05:45:00    30
1900-01-01 06:00:00    30
1900-01-01 06:15:00    30
1900-01-01 06:30:00    30
1900-01-01 06:45:00    30
1900-01-01 07:00:00    30
1900-01-01 07:15:00    30
...                   ...
1900-01-01 16:30:00    45
1900-01-01 16:45:00    45
1900-01-01 17:00:00    45
1900-01-01 17:15:00    45
1900-01-01 17:30:00    45
1900-01-01 17:45:00    45
1900-01-01 18:00:00    45
1900-01-01 18:15:00    45
1900-01-01 18:30:00    45
1900-01-01 18:45:00    45
1900-01-01 19:00:00    45
1900-01-01 19:15:00    45
1900-01-01 19:30:00    45
1900-01-01 19:45:00    45
1900-01-01 20:00:00    45
1900-01-01 20:15:00    45
1900-01-01 20:30:00    45
1900-01-01 20:45:00    45
1900-01-01 21:00:00    45
1900-01-01 21:15:00    45
1900-01-01 21:30:00    45
1900-01-01 21:45:00    45
1900-01-01 22:00:00    45
1900-01-01 22:15:00    45
1900-01-01 22:30:00    45
1900-01-01 22:45:00    45
1900-01-01 23:00:00    45
1900-01-01 23:15:00    45
1900-01-01 23:30:00    45
1900-01-01 23:45:00    45

[96 rows x 1 columns]
U13-Forward
  • 69,221
  • 14
  • 89
  • 114