Given a DataFrame having timestamp (ts), I'd like to these by the hour (downsample). Values that were previously indexed by ts should now be divided into ratios based on the number of minutes left in an hour. [note: divide data in ratios for NaN columns while doing resampling]
ts event duration
0 2020-09-09 21:01:00 a 12
1 2020-09-10 00:10:00 a 22
2 2020-09-10 01:31:00 a 130
3 2020-09-10 01:50:00 b 60
4 2020-09-10 01:51:00 b 50
5 2020-09-10 01:59:00 b 26
6 2020-09-10 02:01:00 c 72
7 2020-09-10 02:51:00 b 51
8 2020-09-10 03:01:00 b 63
9 2020-09-10 04:01:00 c 79
def create_dataframe():
df = pd.DataFrame([{'duration':12, 'event':'a', 'ts':'2020-09-09 21:01:00'},
{'duration':22, 'event':'a', 'ts':'2020-09-10 00:10:00'},
{'duration':130, 'event':'a', 'ts':'2020-09-10 01:31:00'},
{'duration':60, 'event':'b', 'ts':'2020-09-10 01:50:00'},
{'duration':50, 'event':'b', 'ts':'2020-09-10 01:51:00'},
{'duration':26, 'event':'b', 'ts':'2020-09-10 01:59:00'},
{'duration':72, 'event':'c', 'ts':'2020-09-10 02:01:00'},
{'duration':51, 'event':'b', 'ts':'2020-09-10 02:51:00'},
{'duration':63, 'event':'b', 'ts':'2020-09-10 03:01:00'},
{'duration':79, 'event':'c', 'ts':'2020-09-10 04:01:00'},
{'duration':179, 'event':'c', 'ts':'2020-09-10 06:05:00'},
])
df.ts = pd.to_datetime(df.ts)
return df
I want to estimate a produced based on the ratio of time spend and produced. This can be compared to how many lines of code have been completed or find how many actual lines per hour?
for example: at "2020-09-10 00:10:00" we have 22. Then during the period from 21:01 - 00:10, we produced based on
59 min of 21:00 hours -> 7 => =ROUND(22/189*59,0)
60 min of 22:00 hours -> 7 => =ROUND(22/189*60,0)
60 min of 23:00 hours -> 7 => =ROUND(22/189*60,0)
10 min of 00:00 hours -> 1 => =ROUND(22/189*10,0)
the result should be something like.
ts event duration
0 2020-09-09 20:00:00 a NaN
1 2020-09-10 21:00:00 a 7
2 2020-09-10 22:00:00 a 7
3 2020-09-10 23:00:00 a 7
4 2020-09-10 00:00:00 a 1
5 2020-09-10 01:00:00 b ..
6 2020-09-10 02:01:00 c ..
Problem with this approach:
It appears to me that, we are having a serious issue with this approach. If you look at the rows[1] -> 2020-09-10 07:00:00, we have 4, we need to divide it between 3 hours. Considering base duration value as 1 (base unit), we however get
def create_dataframe2():
df = pd.DataFrame([{'duration':4, 'event':'c', 'c':'event3.5', 'ts':'2020-09-10 07:00:00'},
{'duration':4, 'event':'c', 'c':'event3.5', 'ts':'2020-09-10 10:00:00'}])
df.ts = pd.to_datetime(df.ts)
return df
Source
duration event c ts
0 4 c event3.5 2020-09-10 07:00:00
1 4 c event3.5 2020-09-10 10:00:00
Expected Output
ts_hourly mins duration
0 2020-09-10 07:00:00 60.0 2
1 2020-09-10 08:00:00 60.0 1
2 2020-09-10 09:00:00 60.0 1
3 2020-09-10 10:00:00 0.0 0