0

Let's say I have a list of of timestamps and associated indices of some event. I have M timestamps which are not evenly spaced. I would like to downsample this list to N < M in such a way that my timestamps will be more or less equally spaced, i.e I don't want to just take first N from M.

timestamps = 
[(Timestamp('2013-10-05 12:52:00+0000', tz='UTC'), 0),
 (Timestamp('2013-10-07 18:38:00+0000', tz='UTC'), 1),
 (Timestamp('2013-10-12 11:30:00+0000', tz='UTC'), 5),
 (Timestamp('2013-10-13 11:58:00+0000', tz='UTC'), 7),
 (Timestamp('2013-10-14 17:26:00+0000', tz='UTC'), 11),
 (Timestamp('2013-10-16 17:54:00+0000', tz='UTC'), 12),
 (Timestamp('2013-10-17 21:26:00+0000', tz='UTC'), 14),
 (Timestamp('2013-10-20 13:37:00+0000', tz='UTC'), 17),
 (Timestamp('2013-10-22 18:16:00+0000', tz='UTC'), 18),
 (Timestamp('2013-10-25 23:37:00+0000', tz='UTC'), 19),
 (Timestamp('2013-10-26 13:36:00+0000', tz='UTC'), 20),
 (Timestamp('2013-10-30 19:11:00+0000', tz='UTC'), 26),
 (Timestamp('2013-11-07 21:13:00+0000', tz='UTC'), 28),
 (Timestamp('2013-11-08 13:16:00+0000', tz='UTC'), 29),
 (Timestamp('2013-11-15 18:19:00+0000', tz='UTC'), 32),
 (Timestamp('2013-11-16 00:27:00+0000', tz='UTC'), 33),
 (Timestamp('2013-11-16 18:55:00+0000', tz='UTC'), 35),
 (Timestamp('2013-12-04 16:58:00+0000', tz='UTC'), 40),
 (Timestamp('2013-12-18 09:48:00+0000', tz='UTC'), 47),
 (Timestamp('2013-12-19 08:32:00+0000', tz='UTC'), 50)]

let's say M=20 like in the example above and and N=15. Is there any smart/known way to do this? The only thing that comes to my mind is calculating timedelta between each timestamp and then trying to do some search/optimization/genetic algorithm based on maximising median timedelta so that we dont get multiple points that are very close in time and some that are far away. Another option could be to generate N equally spaced points between start and end and try to match the closest existing ones to the artificial ones. I looked into pandas resample but it does not downsample the way I want, I need an exact N number of elements to be left in my list. Any ideas, tips appreciated.

user3386109
  • 34,287
  • 7
  • 49
  • 68
chess
  • 61
  • 4
  • 1
    I think M=20, no? – Corralien Sep 23 '21 at 21:54
  • You do **not** want the event indices mutated? – wwii Sep 23 '21 at 22:06
  • Do you have enough data to get a reliable standard deviation or variance for the deltas between samples? – wwii Sep 23 '21 at 23:11
  • *"I would like to downsample this list to N < M in such a way that my timestamps will be more or less equally spaced"* That's not going to happen unless the timestamps in the original list are equally spaced. And they are not evenly spaced in your example. What you have is conflicting requirements: 1) use timestamps from the input list, 2) use timestamps that are evenly spaced. One of those requirements must be dropped. Either you generate evenly spaced timestamps, and interpolate the data for those times, or you accept that the chosen timestamps are not evenly spaced (not even close). – user3386109 Sep 24 '21 at 01:43

1 Answers1

0

Any ideas, tips

  • find the time Span by subtracting the oldest from the newest
  • find an ideal period by dividing that Span by N-1
  • add one period to the oldest sample
    • find the sample closest to that time
  • add two periods to the oldest sample
    • find the sample closest to that time
  • repeat ...

If you have a bunch of samples at the beginning that are far apart then a bunch of samples that are close together at the end that won't work. Maybe go back through the selected samples and throw out any that are too-close-together - reducing N.

wwii
  • 23,232
  • 7
  • 37
  • 77