1

We have a dataframe that has break events that have happened on a production line.

# Example dataframe
df = pd.DataFrame({
    'RSNCODE': ['300.306', '100.102', '300.306'],
    'BEGTIME': ['2022-06-08 22:21:47', '2022-06-22 14:00:00', '2022-07-25 21:19:22'],
    'ENDTIME': ['2022-06-10 00:05:40', '2022-06-30 04:23:32', '2022-07-26 17:41:21'],
    'Reason': ['Planned shutdown', 'Shiftpatterns / Not planned shift days', 'Planned shutdown'],
    'Break_duration': [25.731667, 182.392500, 20.366667],
    'month': ['2022-06', '2022-06', '2022-07'],
    'week': ['2022-06-06/2022-06-12', '2022-06-20/2022-06-26', '2022-07-25/2022-07-31']
})

We have extracted the week where the break started into the 'week' column of the dataframe, but the obvious issue here is that we lose information when we do that. The middle row has a a break that has lasted for over 168 hours (the number of hours in a week), so by default we would want to have this row spread out on two different weeks. It looks like some half of the middle break happened in the week '2022-06-20/2022-06-26', while the other half happened in the week '2022-06-27/2022-07-03' which is frankly missing from the dataframe entirely, now.

Is there a way to do this, keeping record of precise timings when aggregating over the week (or month)?

I have tried different methods of grouping by the week, but all pandas default methods seem to result in the same issue in my case. This might be doable quite easily by iterating over rows and applying some control flow, but even then it seems like a tricky task that I believe someone else has performed in the past - my search just yielded no results.

1 Answers1

0

You could do the following:

df = pd.DataFrame({
    'RSNCODE': ['300.306', '100.102', '300.306'],
    'BEGTIME': ['2022-06-08 22:21:47', '2022-06-22 14:00:00', '2022-07-25 21:19:22'],
    'ENDTIME': ['2022-06-10 00:05:40', '2022-06-30 04:23:32', '2022-07-26 17:41:21'],
    'Reason': ['Planned shutdown', 'Shiftpatterns / Not planned shift days', 'Planned shutdown'],
    'Break_duration': [25.731667, 182.392500, 20.366667],
    'month': ['2022-06', '2022-06', '2022-07'],
    'week': ['2022-06-06/2022-06-12', '2022-06-20/2022-06-26', '2022-07-25/2022-07-31']
})

rows = []
for i, row in df.iterrows():
    start_time = pd.to_datetime(row['BEGTIME'])
    end_time = pd.to_datetime(row['ENDTIME'])
    duration = (end_time - start_time).total_seconds() / 3600
    for j in range(int(duration)):
        hour_start = start_time + pd.Timedelta(hours=j)
        hour_end = start_time + pd.Timedelta(hours=j+1)
        week = hour_start.strftime('%Y-%m-%d/%Y-%m-%d')
        rows.append({
            'RSNCODE': row['RSNCODE'],
            'hour_start': hour_start,
            'hour_end': hour_end,
            'Reason': row['Reason'],
            'Break_duration': 1.0,
            'month': row['month'],
            'week': week
        })

# Create a new dataframe from the list of dictionaries
new_df = pd.DataFrame(rows)

print(new_df)

which returns

     RSNCODE          hour_start            hour_end            Reason  \
0    300.306 2022-06-08 22:21:47 2022-06-08 23:21:47  Planned shutdown   
1    300.306 2022-06-08 23:21:47 2022-06-09 00:21:47  Planned shutdown   
2    300.306 2022-06-09 00:21:47 2022-06-09 01:21:47  Planned shutdown   
3    300.306 2022-06-09 01:21:47 2022-06-09 02:21:47  Planned shutdown   
4    300.306 2022-06-09 02:21:47 2022-06-09 03:21:47  Planned shutdown   
..       ...                 ...                 ...               ...   
222  300.306 2022-07-26 12:19:22 2022-07-26 13:19:22  Planned shutdown   
223  300.306 2022-07-26 13:19:22 2022-07-26 14:19:22  Planned shutdown   
224  300.306 2022-07-26 14:19:22 2022-07-26 15:19:22  Planned shutdown   
225  300.306 2022-07-26 15:19:22 2022-07-26 16:19:22  Planned shutdown   
226  300.306 2022-07-26 16:19:22 2022-07-26 17:19:22  Planned shutdown   

     Break_duration    month                   week  
0               1.0  2022-06  2022-06-08/2022-06-08  
1               1.0  2022-06  2022-06-08/2022-06-08  
2               1.0  2022-06  2022-06-09/2022-06-09  
3               1.0  2022-06  2022-06-09/2022-06-09  
4               1.0  2022-06  2022-06-09/2022-06-09  
..              ...      ...                    ...  
222             1.0  2022-07  2022-07-26/2022-07-26  
223             1.0  2022-07  2022-07-26/2022-07-26  
224             1.0  2022-07  2022-07-26/2022-07-26  
225             1.0  2022-07  2022-07-26/2022-07-26  
226             1.0  2022-07  2022-07-26/2022-07-26  

[227 rows x 7 columns]