0

I've been reading the forum, investigating on internet. But can't figure out how to apply a pandas functions to resume this whole code:

def get_time_and_date(schedule, starting_date, position):
    # calculate time and date for each start and ending time if the ending time < starting time, add one day to the ending.
    my_time = datetime.strptime(schedule.split('-')[position], '%H:%M')
    my_date = datetime.strptime(starting_date, '%Y-%m-%d')
    # get the starting hour for the range if we are calculating last interval
    if position == 1: 
        starting_hour = datetime.strptime(schedule.split('-')[0], '%H:%M')
        starting_hour = datetime(my_date.year, my_date.month, my_date.day, starting_hour.hour, 0)
    # convert unify my_time and my_date normalizing the minutes
    if hora.minute >= 30:
        my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, my_time.hour, 30)
    else:
        my_hour_and_date = datetime(my_date.year, my_date.month, my_date.day, hora.hour, 0)
    # if final time of the day < than starting time, means there is a day jump, so we add a day
    if position == 1 and my_hour_and_date < starting_hour: my_hour_and_date += timedelta(days=1)
    return my_hour_and_date


def get_time_interval_ranges(schedule, my_date):
    # get all match schedules if there are any
    schedules = schedule.split('/')
    intervals_list = []        
    # loop through al the schedules and add the range with the split separator "Separa aquí"
    for my_schedule in schedules:
        current_range = pd.date_range(start=get_time_and_date(my_schedule, my_date, 0), end=get_time_and_date(my_schedule, my_date, 1), freq="30min").strftime('%Y-%m-%d, %H:%M').to_list()        
        intervals_list += current_range
        intervals_list.append('separate_range_here')  
    return intervals_list


def generate_time_intervals(df, column_to_process, new_column):
    #generate range of times column
    df[new_column] = df.apply(lambda row: get_time_interval_ranges(row[column_to_process], row['my_date'], True), axis=1)
    return df

I believe there is a better way to do this, but I can't find out how. What I'm giving to the first function(generate_time_intervals) is a dataFrame with some columns but only Date (yyyy-mm-dd) and schedule are important.

When the schedule is 09:00-15:00 it's easy, just split by the "-" and give it to the builtint function data_range. The problem comes to handle horrendous times like the one on the title or the likes of 09:17-16:24.

Is there any way to handle this without so much looping and the sorts in my code?

Edit:

With this input:

Worker Date Schedule
Worker1 2022-05-01 09:00-10:00/11:00-14:00/15:00-18:00
Worker2 2022-05-01 09:37-15:38

I would like this output:

Date Interval Working Minutes
2022-05-01 09:00 30
2022-05-01 09:30 53
2022-05-01 10:00 30
2022-05-01 10:30 30
2022-05-01 11:00 60
2022-05-01 11:30 60
2022-05-01 12:00 60
2022-05-01 12:30 60
2022-05-01 13:00 60
2022-05-01 13:30 60
2022-05-01 14:00 30
2022-05-01 14:30 30
2022-05-01 15:00 60
2022-05-01 15:30 38
2022-05-01 16:00 30
2022-05-01 16:30 30
2022-05-01 17:00 30
2022-05-01 17:30 30
2022-05-01 18:00 0
Damian
  • 5,152
  • 1
  • 10
  • 21
  • Something that comes to mind is to convert them to `epoch`, round accordingly and convert back to datetime. See [converting datetime to epoch](https://stackoverflow.com/questions/35630098/convert-a-column-of-datetimes-to-epoch-in-python) – Zaero Divide May 18 '22 at 18:27

1 Answers1

1

Working with datetime:

df= pd.DataFrame({'schedule':['09:17-16:24','19:40-21:14']})
schedules = df.schedule.str.split('-',expand=True)
start = pd.to_datetime(schedules[0]).dt.round('H')
end = pd.to_datetime(schedules[1]).dt.round('H')

df['interval_out'] = start.dt.hour.astype(str) + ':00 - ' + end.dt.hour.astype(str) + ':00'

And result:

>>> df
      schedule
0  09:17-16:24
1  19:40-21:14

>>> schedules
       0      1
0  09:17  16:24
1  19:40  21:14

>>> start
0   2022-05-18 09:00:00
1   2022-05-18 20:00:00
Name: 0, dtype: datetime64[ns]
>>> end
0   2022-05-18 16:00:00
1   2022-05-18 21:00:00
Name: 1, dtype: datetime64[ns]

>>> df
      schedule   interval_out
0  09:17-16:24   9:00 - 16:00
1  19:40-21:14  20:00 - 21:00
>>>

Of course the rounding should be floor & ceil if you want to expand it...

EDIT: Trying the original question... It also helps if you read about datetime functions in Pandas (which now I learnt...):facepalm:

  • Expand the blocks into individual items start/stop
  • Floor / ceil them for the start/stop
  • Calculate the intervals using a convenient pandas function...
  • Explode the intervals as rows
  • Calculate the late start
  • Calculate soon stop
  • Calculate how many people were actually in the office
  • Group data on slots, adding lost minutes and worked minutes * worker
  • Do the calculation
df['timeblocks']= df.Schedule.str.split('/')
df2 = df.explode('timeblocks')
timeblocks = df2.timeblocks.str.split('-',expand=True)
df2['start'] = pd.to_datetime(df2.Date + " " + timeblocks[0])
df2['stop'] = pd.to_datetime(df2.Date + " " + timeblocks[1])
df2['start_slot'] = df2.start.dt.floor('30min')
df2['stop_slot'] = df2.stop.dt.ceil('30min')

df2['intervals'] = df2.apply(lambda x: pd.date_range(x.start_slot, x.stop_slot, freq='30min'), axis=1)
df3 = df2.explode('intervals')
df3['late_start'] = (df3.start>df3.intervals)*(df3.start-df3.intervals).dt.seconds/60
df3['soon_stop']= ((df3.stop>df3.intervals) & (df3.stop<(df3.intervals+pd.Timedelta('30min'))))*((df3.intervals+pd.Timedelta('30min'))-df3.stop).dt.seconds/60
df3['someone'] =  (df3.start<df3.intervals+pd.Timedelta('30min'))&(df3.stop>df3.intervals)#+pd.Timedelta('30min'))
df4 = df3.groupby('intervals').agg({'late_start':sum, 'soon_stop':sum, 'someone':sum})
df4['worked_time'] = df4.someone*30 - df4.late_start - df4.soon_stop

df4

>>> df4
                     late_start  soon_stop  someone  worked_time
intervals
2022-05-01 09:00:00         0.0        0.0        1         30.0
2022-05-01 09:30:00         7.0        0.0        2         53.0
2022-05-01 10:00:00         0.0        0.0        1         30.0
2022-05-01 10:30:00         0.0        0.0        1         30.0
2022-05-01 11:00:00         0.0        0.0        2         60.0
2022-05-01 11:30:00         0.0        0.0        2         60.0
2022-05-01 12:00:00         0.0        0.0        2         60.0
2022-05-01 12:30:00         0.0        0.0        2         60.0
2022-05-01 13:00:00         0.0        0.0        2         60.0
2022-05-01 13:30:00         0.0        0.0        2         60.0
2022-05-01 14:00:00         0.0        0.0        1         30.0
2022-05-01 14:30:00         0.0        0.0        1         30.0
2022-05-01 15:00:00         0.0        0.0        2         60.0
2022-05-01 15:30:00         0.0       22.0        2         38.0
2022-05-01 16:00:00         0.0        0.0        1         30.0
2022-05-01 16:30:00         0.0        0.0        1         30.0
2022-05-01 17:00:00         0.0        0.0        1         30.0
2022-05-01 17:30:00         0.0        0.0        1         30.0
2022-05-01 18:00:00         0.0        0.0        0          0.0
Zaero Divide
  • 699
  • 2
  • 10