using Python, How to group a column in Dataframe by the hour?

Question

I have a python dataframe (df1) which has a column time. I converted the column into a datetime series using pd.to_datetime(df1['time']). Now I get a column like this:

2016-08-24 00:00:00  2016-08-13  00:00:00   
2016-08-24 00:00:00  2016-08-13  00:00:00     
2016-08-24 00:00:00  2016-08-13  00:00:00   
2016-08-24 00:00:00  2016-08-13  00:00:00  
2016-08-24 00:00:01  2016-08-13  00:00:01   
2016-08-24 00:00:01  2016-08-13  00:00:01   
2016-08-24 00:00:02  2016-08-13  00:00:02  
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02     
....

2016-08-24 23:59:59  2016-08-13  00:00:02

Essentially, I want the first column to be grouped by the hour, so that I can see how many entries are there in 1 hour. Any help will be great.

score 3 · Accepted Answer · answered Aug 25 '16 at 05:17

3

Using @jezrael setup.

df.resample(rule='H', how='count').rename(columns = {'time':'count'})

                      count
2016-08-24 00:00:00      1
2016-08-24 01:00:00      3
2016-08-24 02:00:00      1

answered Aug 25 '16 at 05:17

Merlin

24,552
41
131
206

Yes This works if I use groupby for a single column. Do you know what happens when we use Multi column grouping? – Vijay Aug 28 '16 at 04:04

jezrael · Answer 2 · 2016-08-25T07:06:58.387

Use resample:

#pandas version 0.18.0 and higher
df = df.resample('H').size()

#pandas version below 0.18.0
#df = df.resample('H', 'size')

print (df)
2016-08-24 00:00:00    1
2016-08-24 01:00:00    3
2016-08-24 02:00:00    1
Freq: H, dtype: int64

If need output as DataFrame:

df = df.resample('H').size().rename('count').to_frame()
print (df)
                     count
2016-08-24 00:00:00      1
2016-08-24 01:00:00      3
2016-08-24 02:00:00      1

Or you can remove from DatetimeIndex minutes and seconds by converting to <M8[h] and then aggregating size:

import pandas as pd

df = pd.DataFrame({'time': {pd.Timestamp('2016-08-24 01:00:00'): pd.Timestamp('2016-08-13 00:00:00'), pd.Timestamp('2016-08-24 01:00:01'): pd.Timestamp('2016-08-13 00:00:01'), pd.Timestamp('2016-08-24 01:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 02:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 00:00:00'): pd.Timestamp('2016-08-13 00:00:00')}})
print (df)
                                   time
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 01:00:00 2016-08-13 00:00:00
2016-08-24 01:00:01 2016-08-13 00:00:01
2016-08-24 01:00:02 2016-08-13 00:00:02
2016-08-24 02:00:02 2016-08-13 00:00:02

df= df.groupby([df.index.values.astype('<M8[h]')]).size()
print (df)
2016-08-24 00:00:00    1
2016-08-24 01:00:00    3
2016-08-24 02:00:00    1
dtype: int64

My problem is I have multiple columns that I am grouping by. My code currently is df2 = df1['count'].groupby([df1['sc-status],df1[cs-method],df1[time]).count() Using the above code, and using my current data, I get time as it is in my input file (random requests through the hour). I am struggling to go to the next step which is to group this grouped object (df2) every hour. Hope this makes sense — Vijay, Aug 28 '16 at 04:01

score 1 · Answer 3 · edited May 23 '17 at 11:51

1

You can use pandas.DatetimeIndex as follows.

import numpy as np
import pandas as pd

# An example of time period
drange = pd.date_range('2016-08-01 00:00:00', '2016-09-01 00:00:00',
                       freq='10min')

N = len(drange)

# The number of columns without 'time' is three.
df = pd.DataFrame(np.random.rand(N, 3))
df['time'] = drange

time_col = pd.DatetimeIndex(df['time'])

gb = df.groupby([time_col.year,
                 time_col.month,
                 time_col.day,
                 time_col.hour])

for col_name, gr in gb:
    print(gr)  # If you want to see only the length, use print(len(gr))

[References] Python Pandas: Group datetime column into hour and minute aggregations

edited May 23 '17 at 11:51

Community

1
1

answered Aug 25 '16 at 01:15

Daewon Lee

620
4
6

Hi @Daewon lee....thank you for the answer. When I use this code, it throws an error saying Series object has no value hour. Any thoughts? – Vijay Aug 28 '16 at 04:05
@Vijay Which version of Python do you use? The above code has been test in Anaconda Python 3.5 (64bit) at Windows 10 64bit. (And which version of Pandas do you use? Mine is 0.18.1) – Daewon Lee Aug 29 '16 at 07:23

using Python, How to group a column in Dataframe by the hour?

3 Answers3