I have a dataframe with 2 columns: class (0/1) and time (integer). I need to append a third column which will be the remaining time to get a class 1 row.
df = pd.DataFrame([
[1,101], [1,104],
[0,107], [0,110], [0,123],
[1,156],
[0,167]],
columns=['class', 'time'])
- If a row has class 0;
diff
should be 0. - If a row has class 1;
diff
should be the difference between itstime
andtime
of the first upcoming row that has class 0.
I can calculate it in a Lambda function:
df['diff'] = df.apply(lambda x: df[ (df['time'] >= x[1]) & (df['class']==0)]['time'].iloc[0] - x[1], axis=1)
The expression df[ (df['time'] >= x[1]) & (df['class']==0)]
is run for every row to get the next row with class 0. I believe it is not efficient for big dataframes.
What would be a more efficient way to calculate this?