0

Assume some measurement data (in reality given about every minute) named logData:

import pandas as pd, numpy as np

idxData = pd.to_datetime(['08:00', '08:15', '08:30', '08:45', '09:00'])
logData = pd.DataFrame(np.array([1.0, 2.0, 3.0, 4.0, 5.0]), columns=['val'], index=idxData)
idxRng  = pd.interval_range(idxData[0], idxData[-1], freq='30min')
avgData = logData.groupby( pd.cut(logData.index, idxRng) ).mean()

The data is grouped into avgData e.g. looking like this:

                      val
(08:00:00, 08:30:00]  2.5
(08:30:00, 09:00:00]  4.5

This downsampled avgData should now (after performing some other calculations) be upsampled again, e.g. to a frequency of freq='10min' for further calculations. Since avgData.resample('10min') throws the following error, the question is how to resample categorical data?

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'CategoricalIndex'

Many thanks in advance!

Pontis
  • 343
  • 3
  • 15

2 Answers2

0

In order for a resample to work, your index needs to have a datatype of datetime64[ns] Check the datatype of your index by running the code below.

avgData.index.dtype
0

It took my a little while to figure out how to meaningfully convert a categorical index, but index.categories.mid seems to work, allowing to resample the data via

avgData.set_index( pd.DatetimeIndex( avgData.index.categories.mid ), inplace=True)
avgData = avgData.resample('5min').interpolate(method='nearest')

which yields the expected result:

          val
08:15:00  2.5
08:20:00  2.5
08:25:00  2.5
08:30:00  2.5
08:35:00  4.5
08:40:00  4.5
08:45:00  4.5
Pontis
  • 343
  • 3
  • 15