I have a dataset with a reading (Tank Level) every minute from a piece of equipment and want to create a new dataset (dataframe) with a count of the number of samples per day and the number of readings above a set value.
Noxious Tank Level.MIN Noxious Tank Level.MAX Date_Time
0 9.32 9.33 2019-12-31 05:01:00
1 9.32 9.34 2019-12-31 05:02:00
2 9.32 9.35 2019-12-31 05:03:00
3 9.31 9.35 2019-12-31 05:04:00
4 9.31 9.35 2019-12-31 05:05:00
... ... ... ...
528175 2.98 3.01 2020-12-31 23:56:00
528176 2.98 3.02 2020-12-31 23:57:00
528177 2.98 3.01 2020-12-31 23:58:00
528178 2.98 3.02 2020-12-31 23:59:00
528179 2.98 2.99 2021-01-01 00:00:00
Using a lamdba function I can see whether each value is an overflow (Tank Level > setpoint) - I have also indexed the dataframe by Date_Time:
df['Overflow'] = df.apply(lambda x: True if x['Noxious Tank Level.MIN'] > 89 else False , axis=1)
Noxious Tank Level.MIN Noxious Tank Level.MAX Overflow
Date_Time
2019-12-31 05:01:00 9.32 9.33 False
2019-12-31 05:02:00 9.32 9.34 False
2019-12-31 05:03:00 9.32 9.35 False
2019-12-31 05:04:00 9.31 9.35 False
2019-12-31 05:05:00 9.31 9.35 False
... ... ... ...
2020-12-31 23:56:00 2.98 3.01 False
2020-12-31 23:57:00 2.98 3.02 False
2020-12-31 23:58:00 2.98 3.01 False
2020-12-31 23:59:00 2.98 3.02 False
2021-01-01 00:00:00 2.98 2.99 False
Now I want to count the number of samples per day and the number of 'True' values in the Overflow
column to work out what fraction per day is in Overflow
I get the feeling that resample
or groupby
will be the way to go but I can't figure out how to create a new dataset with just these counts and include the conditional count from the Overflow
column