-1

I need to merge overlapping intervals in my CSV file. Here is the sample data I have

Tasks Start End
Read 10:00 12:00
Read 11:10 13:00
Read 13:50 14:00
Write 14:00 14:30
Write 11:00 15:00

The sample output should be like

Read - [10:00,13:00], [13:50,14:00]

Write - [14:00,15:00]

I see a similar post that does the same work with pandas but it's not allowed to use here

Efficient merge overlapping intervals in same pandas dataframe with start and finish columns

I am trying to use either file reader or CSV reader to parse that file and get the rows in sorted order and compare the indices for overlapping intervals but I am facing challenges in parsing this file

import csv

dict = {}
with open('sample.csv', mode='r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for row in csv_reader:
        name = row["Tasks"]
        start_time = row["Start"]
        end_time = row["End"]
        dict[tasks] = [start_time,end_time]
        print(dict)

Any help will be appriciated

1 Answers1

0

Assuming the times are sorted, you can compute custom groups and aggregate:

# get next start per task
next_start = pd.to_timedelta(df['Start']+':00').groupby(df['Tasks']).shift(-1)
# get current end
end = pd.to_timedelta(df['End']+':00')

# if next start ≥ current end, per task, merge
out = (df.groupby(['Tasks', next_start.ge(end).cumsum()], as_index=False)
         .agg({'Start': 'first', 'End': 'last'})
       )

output:

   Tasks  Start    End
0   Read  10:00  12:00
1   Read  11:10  14:00
2  Write  14:00  15:00

NB. I used timdelta as comparison objects, you could also use strings with:

next_start = df['Start'].groupby(df['Tasks']).shift(-1, fill_value='00:00')
end = df['End']
mozway
  • 194,879
  • 13
  • 39
  • 75