Given for instance a DataFrame with 1h Period
, I would like to set 0 & 1 values in a new column whenever a new 5h Period
starts and finishes respectively.
Let's consider this input data for instance:
import pandas as pd
from random import seed, randint
from collections import OrderedDict
p1h = pd.period_range(start='2020-02-01 00:00', end='2020-03-04 00:00', freq='1h', name='p1h')
seed(1)
values = [randint(0,10) for p in p1h]
df = pd.DataFrame({'Values' : values}, index=p1h)
Result
df.head(10)
Values
p1h
2020-02-01 00:00 2
2020-02-01 01:00 9
2020-02-01 02:00 1
2020-02-01 03:00 4
2020-02-01 04:00 1
2020-02-01 05:00 7
2020-02-01 06:00 7
2020-02-01 07:00 7
2020-02-01 08:00 10
2020-02-01 09:00 6
Is there any way to set-up a new column so as to get following result? (1st & last row for each period is initialized with a 0 and a 1 respectively)
df['period5h'] = df.resample('5h').???
df.head(10)
Values period5h
p1h
2020-02-01 00:00 2 0 <- 1st row of 5h period
2020-02-01 01:00 9
2020-02-01 02:00 1
2020-02-01 03:00 4
2020-02-01 04:00 1 1 <- last row of 5h period
2020-02-01 05:00 7 0 <- 1st row of 5h period
2020-02-01 06:00 7
2020-02-01 07:00 7
2020-02-01 08:00 10
2020-02-01 09:00 6 1 <- last row of 5h period
Please, can this be done in some way with some functions in pandas?
The ultimate goal is then to fill empty values by linear interpolation between 0 and 1 so as to get a progress in % of current row with respect to the 5h period.
Another track / question
Another approach could be to initialize a 2nd DataFrame with 5h
PeriodIndex, initialize values of a new column to 1
, and then upsample the PeriodIndex
back to 1H
to merge both DataFrames.
A shift(-1) would initialize the last row of the period.
I would repeat the process without the shift for the value 0.
Then, how can I create this new DataFrame so that I can merge it to the 1st? I tried some merge commands, but I have an error indicating me that both index don't have the same frequency.
Thanks for your help! Bests