0

I want to represent a daily schedule, given originally as a CSV file, as a Pandas DataFrame. The key to each row in the schedule is an hourly range in a day. The ranges are not overlapping. For example:

00:00, 01:00, some data
01:00, 03:00, some more data
03:00, 04:30, some other data

How can I create a data frame with one level of the index representing the start-to-end hours range?

Little Bobby Tables
  • 5,261
  • 2
  • 39
  • 49

1 Answers1

0

Starting from your example dataframe (put column names on it):

In [78]: df
Out[78]: 
   start    end            other
0  00:00  01:00        some data
1  01:00  03:00   some more data
2  03:00  04:30  some other data

Assuming start and end are strings, we can convert it to a datetime with to_datetime. This will use a default date as the data are only hours:

In [79]: pd.to_datetime(df['end'], format='%H:%M')
Out[79]: 
0   1900-01-01 01:00:00
1   1900-01-01 03:00:00
2   1900-01-01 04:30:00
Name: end, dtype: datetime64[ns]

But assuming the start and end are always on the same day, this default date does not matter if we just use the datetime to calculate the time difference between start and end:

In [80]: df['range'] = pd.to_datetime(df['end'], format='%H:%M') - pd.to_datetime(df['start'], format='%H:%M')


In [81]: df
Out[81]: 
   start    end            other    range
0  00:00  01:00        some data 01:00:00
1  01:00  03:00   some more data 02:00:00
2  03:00  04:30  some other data 01:30:00
joris
  • 133,120
  • 36
  • 247
  • 202