2

Regards.

I'm Struggling trying to figure out how to do the next operation in pandas:

I have a csv file with timestamps of stations like the following:

head of the file

The next thing I do is the following pivot_table using pandas:

trips.pivot_table('bike', aggfunc='count',
                        index=['date', 'hour'],
                        columns='station_arrived').fillna(0)

returning something like this:

enter image description here

My Problem is the following:

I want to reindex the 'hour' column to have indexes from 0 to 23 hours per day, even if there aren't counts that day.

Doing reindex with only one index is easy, but things get complicated when I'm trying this in a multiindex dataframe

Is there any way to make that possible?

ghost
  • 39
  • 1
  • 6
  • I think you will need to create the empty rows before you make the pivot table. So that would involve coming up with a method to check, for each index, which hours are missing, then generate rows for that index with 0/null values for the missing hours. THEN create the pivot. – Sam Mar 01 '16 at 01:42

1 Answers1

2
import datetime as dt
import pandas as pd
from pandas import Timestamp

df = pd.DataFrame(
    {'action': ['C', 'C', 'C', 'C', 'C', 'A', 'C'],
     'bike': [89, 89, 57, 29, 76, 69, 17],
     'cust_id': [6, 6, 30, 30, 30, 30, 30],
     'date': [Timestamp('2010-02-02 00:00:00'),
              Timestamp('2010-02-02 00:00:00'),
              Timestamp('2010-02-05 00:00:00'),
              Timestamp('2010-02-05 00:00:00'),
              Timestamp('2010-02-05 00:00:00'),
              Timestamp('2010-02-05 00:00:00'),
              Timestamp('2010-02-05 00:00:00')],
     'date_arrived': [Timestamp('2010-02-02 14:27:00'),
                      Timestamp('2010-02-02 15:42:00'),
                      Timestamp('2010-02-05 12:06:00'),
                      Timestamp('2010-02-05 12:07:00'),
                      Timestamp('2010-02-05 13:11:00'),
                      Timestamp('2010-02-05 13:14:00'),
                      Timestamp('2010-02-05 13:45:00')],
     'date_removed': [Timestamp('2010-02-02 13:57:00'),
                      Timestamp('2010-02-02 15:12:00'),
                      Timestamp('2010-02-05 11:36:00'),
                      Timestamp('2010-02-05 11:37:00'),
                      Timestamp('2010-02-05 12:41:00'),
                      Timestamp('2010-02-05 12:44:00'),
                      Timestamp('2010-02-05 13:15:00')],
     'hour': [14, 15, 12, 12, 13, 13, 13],
     'station_arrived': [56, 56, 85, 85, 85, 85, 85],
     'station_removed': [56, 56, 85, 85, 85, 85, 85]})

First, create an hourly index spanning your date range:

idx = pd.date_range(df.date.min(), df.date.max() + dt.timedelta(days=1), freq='H')

Now you want to have a datetime index, so set it to 'date_arrived'. Then use groupby with both TimeGrouper to group on hours and on station_arrived. count the number of non-null station_arrived values. Unstack the results to get the data in the pivot table format.

Finally, use reindex to set the index on your new hourly interval idx index, and fill null values with zero.

>>> (df
     .set_index('date_arrived')
     .groupby([pd.TimeGrouper('H'), 'station_arrived'])
     .station_arrived
     .count()
     .unstack()
     .reindex(idx)
     .fillna(0)
     )
station_arrived      56  85
2010-02-02 00:00:00   0   0
2010-02-02 01:00:00   0   0
2010-02-02 02:00:00   0   0
2010-02-02 03:00:00   0   0
2010-02-02 04:00:00   0   0
2010-02-02 05:00:00   0   0
2010-02-02 06:00:00   0   0
2010-02-02 07:00:00   0   0
2010-02-02 08:00:00   0   0
2010-02-02 09:00:00   0   0
2010-02-02 10:00:00   0   0
2010-02-02 11:00:00   0   0
2010-02-02 12:00:00   0   0
2010-02-02 13:00:00   0   0
2010-02-02 14:00:00   1   0
2010-02-02 15:00:00   1   0
2010-02-02 16:00:00   0   0
...
Alexander
  • 105,104
  • 32
  • 201
  • 196
  • Very impresive and fancy method! I was reading a similar method from this link: https://stackoverflow.com/questions/17287933/filling-in-date-gaps-in-multiindex-pandas-dataframe?rq=1 pointing to the unstack could be a solution but yours just solved it perfectly. Thank you! And sorry for not be able to upvote your solution. not enough points to do so. – ghost Mar 01 '16 at 03:00