0

I have the following second-resolution dataframe:

                     timestamp     value
------------------------------------------
0    2015-02-21 03:42:35+00:00        45 
1    2015-02-21 03:42:36+00:00        46 
2    2015-02-21 03:42:37+00:00        49  
3    2015-02-21 03:42:38+00:00        55
4    2015-02-21 03:42:39+00:00        59
5    2015-02-21 03:42:40+00:00        54
...

This dataframe was created after running:

df['timestamp'] = pd.to_datetime(df['timestamp'], utc=True)

What I want to do is take my seconds resolution timestamps, and then resample as milliseconds, and then fill in those new millisecond timestamps with interpolated (linear interpolation) values, so I will be left with a dataframe of now millisecond-resolution data. I tried the following code:

upsampled = df.resample('ms')
interpolated = upsampled.interpolate(method='linear')
interpolated.head()

And I receive the following error:

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'

from this line:

upsampled = df.resample('ms')

How can I resolve my dataframe issue so that the timestamps can be properly handled, resampled, and interpolated? Since it appears there is some sort of issue with how my timeseries data was read in. Also, if milliseconds is perhaps too high of a resolution to reasonably process, I would be fine with just fractions of seconds too, just something a bit higher-resolution than seconds.

1 Answers1

3

resample by default uses the index of your dataframe, so you can either:

  • Make this column the index via df = df.set_index('timestamp') and then resample.

or

  • Choose to resample on that column via df.resample('ms', on='timestamp')

Personally, I'd use a datetime index as there are some benefits to it:

df.set_index('timestamp', inplace=True)
df.resample('ms').interpolate('linear')

...

df.resample('ms').interpolate('linear')
                                   value
2015-02-21 03:42:35+00:00         45.000
2015-02-21 03:42:35.001000+00:00  45.001
2015-02-21 03:42:35.002000+00:00  45.002
2015-02-21 03:42:35.003000+00:00  45.003
2015-02-21 03:42:35.004000+00:00  45.004
...                                  ...
2015-02-21 03:42:39.996000+00:00  54.020
2015-02-21 03:42:39.997000+00:00  54.015
2015-02-21 03:42:39.998000+00:00  54.010
2015-02-21 03:42:39.999000+00:00  54.005
2015-02-21 03:42:40+00:00         54.000

[5001 rows x 1 columns]

You can also specify how many ms, if you don't want quite such a high resolution:

df.resample('50ms').interpolate('linear')

...

                                  value
2015-02-21 03:42:35+00:00         45.00
2015-02-21 03:42:35.050000+00:00  45.05
2015-02-21 03:42:35.100000+00:00  45.10
2015-02-21 03:42:35.150000+00:00  45.15
2015-02-21 03:42:35.200000+00:00  45.20
...                                 ...
2015-02-21 03:42:39.800000+00:00  55.00
2015-02-21 03:42:39.850000+00:00  54.75
2015-02-21 03:42:39.900000+00:00  54.50
2015-02-21 03:42:39.950000+00:00  54.25
2015-02-21 03:42:40+00:00         54.00

[101 rows x 1 columns]
BeRT2me
  • 12,699
  • 2
  • 13
  • 31
  • had me by 30 seconds :( – ddejohn Jun 13 '22 at 18:19
  • This is a great suggestion and address what I want, so thank you! I like the datetime index option, though the rest of my code (not mentioned here) depends on the timestamp column to be a column rather than the index, so I am trying your `df.resample('5ms', on='timestamp')` option. In trying this however, I received this error: `ValueError: Upsampling from level= or on= selection is not supported, use .set_index(...) to explicitly set index to datetime-like` from this line: `interpolated = upsampled.interpolate(method='linear')`. Do you know if this is because of an error with the resampling? – LostinSpatialAnalysis Jun 13 '22 at 19:14
  • 1
    As the error says, it looks like Upsampling using on= selection is not supported. My suggestion would be to make it the index for the resampling and then you can reset the datetime index back to a column with `reset_index()` – BeRT2me Jun 13 '22 at 21:16
  • Ok, thanks, this suggestion got it running, but I am not sure my resampling and interpolation actually occurred. I see in your posted output example, and timestamp of `2015-02-21 03:42:35+00:00 `, would then have `2015-02-21 03:42:35.050000+00:00` and `2015-02-21 03:42:35.100000+00:00 45.10` and so on after it. However, in the df I am producing, I still see second resolution, so I am seeing `2015-02-21 03:42:35+00:00` and then `2015-02-21 03:42:36+00:00` and then `2015-02-21 03:42:38+00:00` and so on. I tried `df.resample('5ms').interpolate('linear')` and then `df.reset_index(inplace=True)` – LostinSpatialAnalysis Jun 13 '22 at 22:14
  • I fixed it. I just set `df_masked_new = df_masked.resample('5ms').interpolate('linear')` and now I see those millisecond intervals. I guess I just needed to send the newly resampled and interpolated df to a new variable to actually see the new product. – LostinSpatialAnalysis Jun 13 '22 at 22:19