3

I have a python dataframe (df1) which has a column time. I converted the column into a datetime series using pd.to_datetime(df1['time']). Now I get a column like this:

2016-08-24 00:00:00  2016-08-13  00:00:00   
2016-08-24 00:00:00  2016-08-13  00:00:00     
2016-08-24 00:00:00  2016-08-13  00:00:00   
2016-08-24 00:00:00  2016-08-13  00:00:00  
2016-08-24 00:00:01  2016-08-13  00:00:01   
2016-08-24 00:00:01  2016-08-13  00:00:01   
2016-08-24 00:00:02  2016-08-13  00:00:02  
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02     
....

2016-08-24 23:59:59  2016-08-13  00:00:02  

Essentially, I want the first column to be grouped by the hour, so that I can see how many entries are there in 1 hour. Any help will be great.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
Vijay
  • 33
  • 3

3 Answers3

3

Using @jezrael setup.

df.resample(rule='H', how='count').rename(columns = {'time':'count'})

                      count
2016-08-24 00:00:00      1
2016-08-24 01:00:00      3
2016-08-24 02:00:00      1
Merlin
  • 24,552
  • 41
  • 131
  • 206
  • Yes This works if I use groupby for a single column. Do you know what happens when we use Multi column grouping? – Vijay Aug 28 '16 at 04:04
2

Use resample:

#pandas version 0.18.0 and higher
df = df.resample('H').size()

#pandas version below 0.18.0
#df = df.resample('H', 'size')

print (df)
2016-08-24 00:00:00    1
2016-08-24 01:00:00    3
2016-08-24 02:00:00    1
Freq: H, dtype: int64

If need output as DataFrame:

df = df.resample('H').size().rename('count').to_frame()
print (df)
                     count
2016-08-24 00:00:00      1
2016-08-24 01:00:00      3
2016-08-24 02:00:00      1

Or you can remove from DatetimeIndex minutes and seconds by converting to <M8[h] and then aggregating size:

import pandas as pd

df = pd.DataFrame({'time': {pd.Timestamp('2016-08-24 01:00:00'): pd.Timestamp('2016-08-13 00:00:00'), pd.Timestamp('2016-08-24 01:00:01'): pd.Timestamp('2016-08-13 00:00:01'), pd.Timestamp('2016-08-24 01:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 02:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 00:00:00'): pd.Timestamp('2016-08-13 00:00:00')}})
print (df)
                                   time
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 01:00:00 2016-08-13 00:00:00
2016-08-24 01:00:01 2016-08-13 00:00:01
2016-08-24 01:00:02 2016-08-13 00:00:02
2016-08-24 02:00:02 2016-08-13 00:00:02

df= df.groupby([df.index.values.astype('<M8[h]')]).size()
print (df)
2016-08-24 00:00:00    1
2016-08-24 01:00:00    3
2016-08-24 02:00:00    1
dtype: int64
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • My problem is I have multiple columns that I am grouping by. My code currently is df2 = df1['count'].groupby([df1['sc-status],df1[cs-method],df1[time]).count() Using the above code, and using my current data, I get time as it is in my input file (random requests through the hour). I am struggling to go to the next step which is to group this grouped object (df2) every hour. Hope this makes sense – Vijay Aug 28 '16 at 04:01
1

You can use pandas.DatetimeIndex as follows.

import numpy as np
import pandas as pd

# An example of time period
drange = pd.date_range('2016-08-01 00:00:00', '2016-09-01 00:00:00',
                       freq='10min')

N = len(drange)

# The number of columns without 'time' is three.
df = pd.DataFrame(np.random.rand(N, 3))
df['time'] = drange

time_col = pd.DatetimeIndex(df['time'])

gb = df.groupby([time_col.year,
                 time_col.month,
                 time_col.day,
                 time_col.hour])

for col_name, gr in gb:
    print(gr)  # If you want to see only the length, use print(len(gr))

[References] Python Pandas: Group datetime column into hour and minute aggregations

Community
  • 1
  • 1
Daewon Lee
  • 620
  • 4
  • 6
  • Hi @Daewon lee....thank you for the answer. When I use this code, it throws an error saying Series object has no value hour. Any thoughts? – Vijay Aug 28 '16 at 04:05
  • @Vijay Which version of Python do you use? The above code has been test in Anaconda Python 3.5 (64bit) at Windows 10 64bit. (And which version of Pandas do you use? Mine is 0.18.1) – Daewon Lee Aug 29 '16 at 07:23