I have a dataframe df
:
col1 col2 col3
2020-01-02 08:50:00 360.0 -131.0 -943.0
2020-01-02 08:52:01 342.0 -130.0 -1006.0
2020-01-02 08:55:04 321.0 -130.0 -997.0
... ... ... ...
2022-01-03 14:44:56 1375.0 -91.0 -728.0
2022-01-03 14:50:57 1381.0 -118.0 -692.0
2022-01-03 14:50:58 1382.0 -115.0 -697.0
2022-01-03 14:50:59 1390.0 -111.0 -684.0
2022-01-03 14:55:58 1442.0 -106.0 -691.0
I want a function that obtains the indices that:
Are NOT within a specific time (e.g., 5 minutes) of each other.
For example:
masked_df = time_mask(df.index, pd.Timedelta(minutes=5))
masked_df:
col1 col2 col3
2020-01-02 08:50:00 360.0 -131.0 -943.0
2020-01-02 08:55:04 321.0 -130.0 -997.0
... ... ... ...
2022-01-03 14:44:56 1375.0 -91.0 -728.0
2022-01-03 14:50:57 1381.0 -118.0 -692.0
2022-01-03 14:55:58 1442.0 -106.0 -691.0
The function time_mask
should obtain the first index that is not within 5 minutes of the previously added index. Below is my iterative attempt to solve this problem:
def get_clean_ix_from_rolling(idx, time_delt):
clean_ix = []
prev_ix = idx[0]
clean_ix.append(prev_ix)
for i, x in enumerate(idx):
if((x-prev_ix) >= time_delt):
clean_ix.append(x)
prev_ix = x
ix = pd.to_datetime(clean_ix)
return ix
How can I speed up my code above?