generate date_range for string intervals like 09:00-10:00/11:00-14:00/15:00-18:00 with BuiltIn functions for pandas

Question

I've been reading the forum, investigating on internet. But can't figure out how to apply a pandas functions to resume this whole code:

def get_time_and_date(schedule, starting_date, position):
    # calculate time and date for each start and ending time if the ending time < starting time, add one day to the ending.
    my_time = datetime.strptime(schedule.split('-')[position], '%H:%M')
    my_date = datetime.strptime(starting_date, '%Y-%m-%d')
    # get the starting hour for the range if we are calculating last interval
    if position == 1: 
        starting_hour = datetime.strptime(schedule.split('-')[0], '%H:%M')
        starting_hour = datetime(my_date.year, my_date.month, my_date.day, starting_hour.hour, 0)
    # convert unify my_time and my_date normalizing the minutes
    if hora.minute >= 30:
        my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, my_time.hour, 30)
    else:
        my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, hora.hour, 0)
    # if final time of the day < than starting time, means there is a day jump, so we add a day
    if position == 1 and my_hour_and_date < starting_hour: my_hour_and_date += timedelta(days=1)
    return my_hour_and_date


def get_time_interval_ranges(schedule, my_date):
    # get all match schedules if there are any
    schedules = schedule.split('/')
    intervals_list = []        
    # loop through al the schedules and add the range with the split separator "Separa aquí"
    for my_schedule in schedules:
        current_range = pd.date_range(start=get_time_and_date(my_schedule, my_date, 0), end=get_time_and_date(my_schedule, my_date, 1), freq="30min").strftime('%Y-%m-%d, %H:%M').to_list()        
        intervals_list += current_range
        intervals_list.append('separate_range_here')  
    return intervals_list


def generate_time_intervals(df, column_to_process, new_column):
    #generate range of times column
    df[new_column] = df.apply(lambda row: get_time_interval_ranges(row[column_to_process], row['my_date'], True), axis=1)
    return df

I believe there is a better way to do this, but I can't find out how. What I'm giving to the first function(generate_time_intervals) is a dataFrame with some columns but only Date (yyyy-mm-dd) and schedule are important.

When the schedule is 09:00-15:00 it's easy, just split by the "-" and give it to the builtint function data_range. The problem comes to handle horrendous times like the one on the title or the likes of 09:17-16:24.

Is there any way to handle this without so much looping and the sorts in my code?

Edit:

With this input:

Worker	Date	Schedule
Worker1	2022-05-01	09:00-10:00/11:00-14:00/15:00-18:00
Worker2	2022-05-01	09:37-15:38

I would like this output:

Date	Interval	Working Minutes
2022-05-01	09:00	30
2022-05-01	09:30	53
2022-05-01	10:00	30
2022-05-01	10:30	30
2022-05-01	11:00	60
2022-05-01	11:30	60
2022-05-01	12:00	60
2022-05-01	12:30	60
2022-05-01	13:00	60
2022-05-01	13:30	60
2022-05-01	14:00	30
2022-05-01	14:30	30
2022-05-01	15:00	60
2022-05-01	15:30	38
2022-05-01	16:00	30
2022-05-01	16:30	30
2022-05-01	17:00	30
2022-05-01	17:30	30
2022-05-01	18:00	0

Something that comes to mind is to convert them to `epoch`, round accordingly and convert back to datetime. See [converting datetime to epoch](https://stackoverflow.com/questions/35630098/convert-a-column-of-datetimes-to-epoch-in-python) — Zaero Divide, May 18 '22 at 18:27

Zaero Divide · Accepted Answer · 2022-05-19T00:04:13.750

Working with datetime:

df= pd.DataFrame({'schedule':['09:17-16:24','19:40-21:14']})
schedules = df.schedule.str.split('-',expand=True)
start = pd.to_datetime(schedules[0]).dt.round('H')
end = pd.to_datetime(schedules[1]).dt.round('H')

df['interval_out'] = start.dt.hour.astype(str) + ':00 - ' + end.dt.hour.astype(str) + ':00'

And result:

>>> df
      schedule
0  09:17-16:24
1  19:40-21:14

>>> schedules
       0      1
0  09:17  16:24
1  19:40  21:14

>>> start
0   2022-05-18 09:00:00
1   2022-05-18 20:00:00
Name: 0, dtype: datetime64[ns]
>>> end
0   2022-05-18 16:00:00
1   2022-05-18 21:00:00
Name: 1, dtype: datetime64[ns]

>>> df
      schedule   interval_out
0  09:17-16:24   9:00 - 16:00
1  19:40-21:14  20:00 - 21:00
>>>

Of course the rounding should be floor & ceil if you want to expand it...

EDIT: Trying the original question... It also helps if you read about datetime functions in Pandas (which now I learnt...):facepalm:

Expand the blocks into individual items start/stop
Floor / ceil them for the start/stop
Calculate the intervals using a convenient pandas function...
Explode the intervals as rows
Calculate the late start
Calculate soon stop
Calculate how many people were actually in the office
Group data on slots, adding lost minutes and worked minutes * worker
Do the calculation

df['timeblocks']= df.Schedule.str.split('/')
df2 = df.explode('timeblocks')
timeblocks = df2.timeblocks.str.split('-',expand=True)
df2['start'] = pd.to_datetime(df2.Date + " " + timeblocks[0])
df2['stop'] = pd.to_datetime(df2.Date + " " + timeblocks[1])
df2['start_slot'] = df2.start.dt.floor('30min')
df2['stop_slot'] = df2.stop.dt.ceil('30min')

df2['intervals'] = df2.apply(lambda x: pd.date_range(x.start_slot, x.stop_slot, freq='30min'), axis=1)
df3 = df2.explode('intervals')
df3['late_start'] = (df3.start>df3.intervals)*(df3.start-df3.intervals).dt.seconds/60
df3['soon_stop']= ((df3.stop>df3.intervals) & (df3.stop<(df3.intervals+pd.Timedelta('30min'))))*((df3.intervals+pd.Timedelta('30min'))-df3.stop).dt.seconds/60
df3['someone'] =  (df3.start<df3.intervals+pd.Timedelta('30min'))&(df3.stop>df3.intervals)#+pd.Timedelta('30min'))
df4 = df3.groupby('intervals').agg({'late_start':sum, 'soon_stop':sum, 'someone':sum})
df4['worked_time'] = df4.someone*30 - df4.late_start - df4.soon_stop

df4

>>> df4
                     late_start  soon_stop  someone  worked_time
intervals
2022-05-01 09:00:00         0.0        0.0        1         30.0
2022-05-01 09:30:00         7.0        0.0        2         53.0
2022-05-01 10:00:00         0.0        0.0        1         30.0
2022-05-01 10:30:00         0.0        0.0        1         30.0
2022-05-01 11:00:00         0.0        0.0        2         60.0
2022-05-01 11:30:00         0.0        0.0        2         60.0
2022-05-01 12:00:00         0.0        0.0        2         60.0
2022-05-01 12:30:00         0.0        0.0        2         60.0
2022-05-01 13:00:00         0.0        0.0        2         60.0
2022-05-01 13:30:00         0.0        0.0        2         60.0
2022-05-01 14:00:00         0.0        0.0        1         30.0
2022-05-01 14:30:00         0.0        0.0        1         30.0
2022-05-01 15:00:00         0.0        0.0        2         60.0
2022-05-01 15:30:00         0.0       22.0        2         38.0
2022-05-01 16:00:00         0.0        0.0        1         30.0
2022-05-01 16:30:00         0.0        0.0        1         30.0
2022-05-01 17:00:00         0.0        0.0        1         30.0
2022-05-01 17:30:00         0.0        0.0        1         30.0
2022-05-01 18:00:00         0.0        0.0        0          0.0

Zero, I've edited my question to show you what the desired output should be. — Damian, May 18 '22 at 18:54
I group by department and time (here the department doesn't matter) but yeah, don't need to separate workers — Damian, May 18 '22 at 19:03
Thought you gave me a genius idea, getting the time intervals like lists on the source — Damian, May 18 '22 at 19:11
@Damian finally I think I was able to solve it. It was a nice puzzle — Zaero Divide, May 19 '22 at 00:04

generate date_range for string intervals like 09:00-10:00/11:00-14:00/15:00-18:00 with BuiltIn functions for pandas

1 Answers1