First, it is better practice to explicitly convert 'timestamp'
column to DatetimeIndex
type:
df = pd.DataFrame({
'timestamp': pd.to_datetime([
'2013-03-01 08:01:00', '2013-03-01 08:02:00',
'2013-03-01 08:03:00', '2013-03-01 08:04:00',
'2013-03-01 08:05:00', '2013-03-01 08:06:00']),
'Kind': ['A', 'B', 'A', 'B', 'A', 'B'],
'Values': [ 1, 4.5, 2, 7, 5, 9] })
Please put attention to the changed values of B
kind. Now, when you resample mean()
estimates the new value as average of two existing ones. It might happen that more than one new data points fall between existing ones, and pandas
fills their values with NaNs
. You can use ffill()
or bfill()
, depending on whether side of the time interval you wish to be closed. By default it is left, so bfill()
is the choice.
df.set_index('timestamp').groupby('Kind').resample('1.5Min')['Values'].bfill().reset_index()
Out[1]:
Kind timestamp Values
0 A 2013-03-01 08:00:00 1.0
1 A 2013-03-01 08:01:30 2.0
2 A 2013-03-01 08:03:00 2.0
3 A 2013-03-01 08:04:30 5.0
4 B 2013-03-01 08:01:30 4.5
5 B 2013-03-01 08:03:00 7.0
6 B 2013-03-01 08:04:30 9.0
7 B 2013-03-01 08:06:00 9.0
It will use last observed value to fill the NaNs
.
If you wish to interpolate the values, and not just to fill the gaps, use transform(pd.Series.interpolate)
combo. The transform
will apply interpolate()
function on each group. Try resampling with higher frequency (say 10 seconds), you will see the big difference between two approaches.
df = df.set_index('timestamp').groupby('Kind').resample('1.5Min').mean().transform(pd.Series.interpolate).reset_index()
Out[2]:
Kind timestamp Values
0 A 2013-03-01 08:00:00 1.0
1 A 2013-03-01 08:01:30 1.5
2 A 2013-03-01 08:03:00 2.0
3 A 2013-03-01 08:04:30 5.0
4 B 2013-03-01 08:01:30 4.5
5 B 2013-03-01 08:03:00 7.0
6 B 2013-03-01 08:04:30 8.0
7 B 2013-03-01 08:06:00 9.0