I have the following dataframe, df, and I would like to add the 'distance' column to it, such that:
date | active | distance |
---|---|---|
01/09/2022 | 1 | 0 |
02/09/2022 | 0 | 1 |
05/09/2022 | 0 | 2 |
06/09/2022 | 0 | 3 |
07/09/2022 | 0 | 4 |
08/09/2022 | 1 | 0 |
09/09/2022 | 0 | 1 |
Here, the distance is how far away each row is from the previous value of '1' in the active column, with the distance being the number of business days. I have tried using the following:
df['distance'] = np.where(
df['active'] == 1, 0, df['distance'].shift(1,fill_value=0).astype(int) + 1
)
But it seems that Python does not like me referencing a column as I am defining it. I tried to also define a function to run this but unsure how to do so using .shift()
as this command seems necessary in order to use to take the previous value and add to it.
Other variations of the above code do not seem to work since Python really wants to concatenate the shift and the 1 instead of just summing them together.