I have a dataframe (below) where I read in time series data of when lights were turned and left on and then turned off. I created a series of calculated columns in my dataframe to identify each cluster (continuous "On" clusters). What I could like to do is sum the total length of time of that cluster but only store that value at the earliest time (first "On") of each particular cluster. *** The desired outcome would look like column "Length"
I have all the necessary components to create the cluster but cannot figure out the code to sum the timesteps of that particular cluster and store the value at the earliest time.
ANY help would be greatly appreciated!
The data is also here (I WAS UNABLE REPLACE THE 0s IN COLUMNS "CLUSTER" and "LENGTH" with NaNs but They should be. The IMAGE is correct."
import pandas as pd
import numpy as np
data = {'Series': ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A' ],
'Time': [0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5],
'TimeStep': [0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5,0.5],
'Light': ["Off", "Off", "On", "On", "On", "Off", "Off", "Off", "On", "On", "On", "On", "On", "Off", "Off"],
'Cluster': [0, 0, 1, 1, 1, 0, 0, 0, 1,1, 1, 1, 1, 0, 0],
'ClusterLength': [0, 0, 1.5, 0, 0, 0, 0, 0, 2.5, 0, 0, 0, 0, 0, 0], }
df = pd.DataFrame (data, columns = ['Series','Time','TimeStep','Light','Cluster','ClusterLength'])
df