I have a pandas dataframe or series with timestamps at irregular intervals. I want to filter the table so that between 2 rows the minimum distance of for example 20ms is kept. The distance may also be larger. Here is an example table on the left and right, how this table should be filtered based on the minimum distance.
317 2022-12-31 00:00:00.360 317 2022-12-31 00:00:00.360
318 2022-12-31 00:00:00.364 318
319 2022-12-31 00:00:00.368 319
320 2022-12-31 00:00:00.372 320 2022-12-31 00:00:00.372
321 2022-12-31 00:00:00.376 321
322 2022-12-31 00:00:00.380 322
323 2022-12-31 00:00:00.384 323
324 2022-12-31 00:00:00.388 324
325 2022-12-31 00:00:00.392 325 2022-12-31 00:00:00.392
326 2022-12-31 00:00:00.396 -> 326
327 2022-12-31 00:00:00.414 327 2022-12-31 00:00:00.414
328 2022-12-31 00:00:00.416 328
329 2022-12-31 00:00:00.420 329
330 2022-12-31 00:00:00.425 330
331 2022-12-31 00:00:00.428 331
332 2022-12-31 00:00:00.432 332
333 2022-12-31 00:00:00.438 333 2022-12-31 00:00:00.438
This is my code so far with which I solved the problem. Actually just a simple for loop:
res=[timestamps[0]]
# iterate over all timestamps with correct samplerate in following window
for dtin timestamps[1:]:
# check if the difference between the current timestamp and the last timestamp in res is equal or bigger than the needed minimum of 20ms =min_delta.
if dt- res[-1] >= min_delta:
# if yes, add timestamp to res
res.append(date)
However, my problem is that the dataframe is very very large and the simple for loop is too inefficient.
I need a faster solution and have already experimented with .resample .diff, vectorization and similar, but never got the desired result! Does anyone have an idea how I could solve the problem efficiently? Thanks already for your ideas!