import datetime as dt
import pandas as pd
from pandas import Timestamp
df = pd.DataFrame(
{'action': ['C', 'C', 'C', 'C', 'C', 'A', 'C'],
'bike': [89, 89, 57, 29, 76, 69, 17],
'cust_id': [6, 6, 30, 30, 30, 30, 30],
'date': [Timestamp('2010-02-02 00:00:00'),
Timestamp('2010-02-02 00:00:00'),
Timestamp('2010-02-05 00:00:00'),
Timestamp('2010-02-05 00:00:00'),
Timestamp('2010-02-05 00:00:00'),
Timestamp('2010-02-05 00:00:00'),
Timestamp('2010-02-05 00:00:00')],
'date_arrived': [Timestamp('2010-02-02 14:27:00'),
Timestamp('2010-02-02 15:42:00'),
Timestamp('2010-02-05 12:06:00'),
Timestamp('2010-02-05 12:07:00'),
Timestamp('2010-02-05 13:11:00'),
Timestamp('2010-02-05 13:14:00'),
Timestamp('2010-02-05 13:45:00')],
'date_removed': [Timestamp('2010-02-02 13:57:00'),
Timestamp('2010-02-02 15:12:00'),
Timestamp('2010-02-05 11:36:00'),
Timestamp('2010-02-05 11:37:00'),
Timestamp('2010-02-05 12:41:00'),
Timestamp('2010-02-05 12:44:00'),
Timestamp('2010-02-05 13:15:00')],
'hour': [14, 15, 12, 12, 13, 13, 13],
'station_arrived': [56, 56, 85, 85, 85, 85, 85],
'station_removed': [56, 56, 85, 85, 85, 85, 85]})
First, create an hourly index spanning your date range:
idx = pd.date_range(df.date.min(), df.date.max() + dt.timedelta(days=1), freq='H')
Now you want to have a datetime index, so set it to 'date_arrived'. Then use groupby
with both TimeGrouper
to group on hours and on station_arrived
. count
the number of non-null station_arrived
values. Unstack the results to get the data in the pivot table format.
Finally, use reindex
to set the index on your new hourly interval idx
index, and fill null values with zero.
>>> (df
.set_index('date_arrived')
.groupby([pd.TimeGrouper('H'), 'station_arrived'])
.station_arrived
.count()
.unstack()
.reindex(idx)
.fillna(0)
)
station_arrived 56 85
2010-02-02 00:00:00 0 0
2010-02-02 01:00:00 0 0
2010-02-02 02:00:00 0 0
2010-02-02 03:00:00 0 0
2010-02-02 04:00:00 0 0
2010-02-02 05:00:00 0 0
2010-02-02 06:00:00 0 0
2010-02-02 07:00:00 0 0
2010-02-02 08:00:00 0 0
2010-02-02 09:00:00 0 0
2010-02-02 10:00:00 0 0
2010-02-02 11:00:00 0 0
2010-02-02 12:00:00 0 0
2010-02-02 13:00:00 0 0
2010-02-02 14:00:00 1 0
2010-02-02 15:00:00 1 0
2010-02-02 16:00:00 0 0
...