0

I have a pandas data frame with 1 column and a time-based index. I want to resample the data for every two seconds and compute the average of the values in the column. Here is an example:

index = pd.date_range('1/1/2000', periods=10, freq='S')
data = {'value':[23,23,12,14,14,57,67,32,56,89]}
series = pd.DataFrame(data, index=index)

The above code gives this result.

enter image description here

Now, I compute the average of the values for every two seconds.

series['resample_value'] = 
series['value'].resample('2S').mean()

This gives me the result as shown in the image below.

enter image description here

But I would like to have the results as shown in image 3. I would like the computed average values to be put back in the original dataframe which was not resampled. How do I obtain it?

enter image description here

Thanks in advance.

EngGu
  • 459
  • 3
  • 14
  • Is my answer below providing what you need? – Anders Källmar Nov 12 '21 at 13:48
  • @AndersKällmar, Hey yes. It works out fine for this above-stated problem. But I found another solution as well, which is resampling and forward filling. – EngGu Nov 15 '21 at 10:49
  • I was considering that solution as well seeing your last edits, The forward filling variant is also a tiny bit faster, but it is really close (just made some tests). – Anders Källmar Nov 16 '21 at 13:20

1 Answers1

1

You can groupby the resampled value using floor, then calculate means for each group and broadcast this to the original rows using transform:

series['value'] = series.groupby(series.index.floor('2S')).value.transform('mean')

print(series)

                        value
2000-01-01 00:00:00      23.0
2000-01-01 00:00:01      23.0
2000-01-01 00:00:02      13.0
2000-01-01 00:00:03      13.0
2000-01-01 00:00:04      35.5
2000-01-01 00:00:05      35.5
2000-01-01 00:00:06      49.5
2000-01-01 00:00:07      49.5
2000-01-01 00:00:08      72.5
2000-01-01 00:00:09      72.5
Anders Källmar
  • 366
  • 1
  • 4