4

I want to resample from hours to half-hours. I use .ffill() in the example, but I've tested .asfreq() as an intermediate step too.

The goal is to get intervals of half hours where the hourly values are spread among the upsampled intervals, and I'm trying to find a general solution for any ranges with the same problem.

import pandas as pd

index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
hourly = pd.Series(range(10, len(index)+10), index=index)
half_hourly = hourly.resample('30min').ffill() / 2

The hourly series looks like:

2018-10-10 00:00:00    10
2018-10-10 01:00:00    11
2018-10-10 02:00:00    12
Freq: H, dtype: int64

And the half_hourly:

2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
Freq: 30T, dtype: float64

The problem with the last one is that there is no row for representing 02:30:00

I want to achieve something that is:

2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
2018-10-10 02:30:00    6.0
Freq: 30T, dtype: float64

I understand that the hourly series ends at 02:00, so there is no reason to expect pandas to insert the last half hour by default. However, after reading a lot of deprecated/old posts, some newer ones, the documentation, and cookbook, I still weren't able to find a straight-forward solution.

Lastly, I've also tested the use of .mean(), but that didn't fill the NaNs. And interpolate() didn't average by hour as I wanted it to.

My .ffill() / 2 almost works as a way to spread hour to half hours in this case, but it seems like a hack to a problem that I expect pandas already provides a better solution to.

Thanks in advance.

Thomas Fauskanger
  • 2,536
  • 1
  • 27
  • 42

1 Answers1

0

Your precise issue can be solved like this

>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2
2018-10-10 00:00:00    5.0
2018-10-10 00:30:00    5.0
2018-10-10 01:00:00    5.5
2018-10-10 01:30:00    5.5
2018-10-10 02:00:00    6.0
2018-10-10 02:30:00    6.0
Freq: 30T, dtype: float64

>>> import pandas as pd
>>> index = pd.date_range('2018-10-10 00:00', '2018-10-10 02:00', freq='H')
>>> hourly = pd.Series(range(10, len(index)+10), index=index)
>>> hourly.reindex(index.union(index.shift(freq='30min'))).ffill() / 2

I suspect that this is a minimal example so I will try to generically solve as well. Lets say you have multiple points to fill in each day

>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> x.resample('6h').ffill()
2018-09-21 00:00:00    1.5
2018-09-21 06:00:00    1.5
2018-09-21 12:00:00    1.5
2018-09-21 18:00:00    1.5
2018-09-22 00:00:00    2.5
Freq: 6H, dtype: float64

Employ a similar trick to include 6am, 12pm, 6pm on 2018-09-22 as well.

Re-index with a shift equal to that you want to have as an inclusive endpoint. In this case our shift is an extra day

>>> import pandas as pd
>>> x = pd.Series([1.5, 2.5], pd.DatetimeIndex(['2018-09-21', '2018-09-22']))
>>> res = x.reindex(x.index.union(x.index.shift(freq='1D'))).resample('6h').ffill()
>>> res[:res.last_valid_index()]  # drop the start of next day
2018-09-21 00:00:00    1.5
2018-09-21 06:00:00    1.5
2018-09-21 12:00:00    1.5
2018-09-21 18:00:00    1.5
2018-09-22 00:00:00    2.5
2018-09-22 06:00:00    2.5
2018-09-22 12:00:00    2.5
2018-09-22 18:00:00    2.5
Freq: 6H, dtype: float64
Alexander McFarlane
  • 10,643
  • 9
  • 59
  • 100