0

i have some issues with my dataresampling in pandas. I´m trying to upsample 15 min values to 1min values. The resampled dataframe values shoud contain the sum spliited equaly between the two values of the original dataframe. This codes generates an extraction of the problem.

import pandas as pd 
import numpy as np


dates = pd.DataFrame(pd.date_range(start="20190101",end="20200101", freq="15min"))
values = pd.DataFrame(np.random.randint(0,10,size=(35041, 1)))

df = pd.concat([dates,values], axis = 1)
df = df.set_index(pd.DatetimeIndex(df.iloc[:,0]))

print(df.resample("min").agg("sum").head(16))

This is an example output:

2019-01-01 00:00:00  3
2019-01-01 00:01:00  0
2019-01-01 00:02:00  0
2019-01-01 00:03:00  0
2019-01-01 00:04:00  0
2019-01-01 00:05:00  0
2019-01-01 00:06:00  0
2019-01-01 00:07:00  0
2019-01-01 00:08:00  0
2019-01-01 00:09:00  0
2019-01-01 00:10:00  0
2019-01-01 00:11:00  0
2019-01-01 00:12:00  0
2019-01-01 00:13:00  0
2019-01-01 00:14:00  0
2019-01-01 00:15:00  3

The values shown as 0 should be replaced by the sum of the two values (in this exapmle: 2019-01-01 00:00:00 3; and 2019-01-01 00:15:00 3) which equals to 6 and this should be evenly distibuted over the timearea.

2019-01-01 00:00:00  6/15
2019-01-01 00:01:00  6/15
2019-01-01 00:02:00  6/15
2019-01-01 00:03:00  6/15
2019-01-01 00:04:00  6/15
2019-01-01 00:05:00  6/15
2019-01-01 00:06:00  6/15
2019-01-01 00:07:00  6/15
2019-01-01 00:08:00  6/15
2019-01-01 00:09:00  6/15
2019-01-01 00:10:00  6/15
2019-01-01 00:11:00  6/15
2019-01-01 00:12:00  6/15
2019-01-01 00:13:00  6/15
2019-01-01 00:14:00  6/15
2019-01-01 00:15:00  6/15

This should be done for each resampled group over the whole Dataframe. In other word the sum of the original dataframe and the resampled dataframe should be equal. Thanks for your help.

Lukas
  • 3
  • 2

1 Answers1

0

First of all, personally, I would recommend working with a series, if there is only one column.

series = pd.Series(index=pd.date_range(start="20190101",end="20200101", 
         freq="15min"), data=(np.random.randint(0,10,size=(35041,))).tolist())

 Then, I would create a new index with minutely values, calculate the cumulative sum of the values and interpolate between these values. In your use case "linear" is suggested as interpolation method:

beginning = series.index[0]
end = series.index[-1]
new_index = pd.date_range(start, end, freq="1T")

cumsum = series.cumsum()
cumsum = result.reindex(new_index)
cumsum = result.interpolate("linear")

Afterwards, you get an interpolated cumulative sum, which you can convert back to your searched values via:

series_upsampled = cumsum.diff()

If you want, you can shift the series_upsampled by 1, doing

series_upsampled = series_upsampled.shift(-1)

Pay attention to NaN value at the beginning (or if you shift your series, at the end).