1

When upsampling a Dataframe, I would to like that new rows created are left empty.

Considering following code:

import pandas as pd

p5h = pd.period_range(start='2020-02-01 00:00', end='2020-03-04 00:00', freq='5h', name='p5h')

df = pd.DataFrame({'Values' : 1}, index=p5h)

I would like to upsample to '1H' frequency, leaving new rows filled with NaN values.

import numpy as np

df1h = df.asfreq('1H', method=None, how='start', fill_value = np.NaN)

But here is what I get:

 df1h.head(7)

                   Values
 p5h                     
 2020-02-01 00:00       1
 2020-02-01 05:00       1
 2020-02-01 10:00       1
 2020-02-01 15:00       1
 2020-02-01 20:00       1
 2020-02-02 01:00       1
 2020-02-02 06:00       1

(need for that is then to merge/join/concat this DataFrame to another one having a '1H' PeriodIndex - this merging operation cannot be achieved if PeriodIndex of both DataFrames do not share the same frequency)

Thanks for any help! Bests

yatu
  • 86,083
  • 12
  • 84
  • 139
pierre_j
  • 895
  • 2
  • 11
  • 26

1 Answers1

3

asfreq() is indeed a method for Period dtypes. Note that your index has dtype:

df.index.dtype
# period[5H]

However, its functionality is slightly different, and it only takes these two parameters:

  • freqstr The desired frequency.

  • how {‘E’, ‘S’, ‘end’, ‘start’}, default ‘end’ Start or end of the timespan.


What could be done to handle the Period index dtype is to use resample and just aggregate with first:

df.resample('1H').first()

                   Values
p5h                     
2020-02-01 00:00     1.0
2020-02-01 01:00     NaN
2020-02-01 02:00     NaN
2020-02-01 03:00     NaN
2020-02-01 04:00     NaN
...                  ...
2020-03-03 21:00     1.0
2020-03-03 22:00     NaN
2020-03-03 23:00     NaN
2020-03-04 00:00     NaN
2020-03-04 01:00     NaN

Though if you instead defined the index using pd.date_range you would get as expected:

p5h = pd.date_range(start='2020-02-01 00:00', end='2020-03-04 00:00', 
                    freq='5h', name='p5h')
df = pd.DataFrame({'Values' : 1}, index=p5h)

df.asfreq('1H')

                      Values
p5h                        
2020-02-01 00:00:00     1.0
2020-02-01 01:00:00     NaN
2020-02-01 02:00:00     NaN
2020-02-01 03:00:00     NaN
2020-02-01 04:00:00     NaN
...                     ...
2020-03-03 17:00:00     NaN
2020-03-03 18:00:00     NaN
2020-03-03 19:00:00     NaN
2020-03-03 20:00:00     NaN
2020-03-03 21:00:00     1.0
yatu
  • 86,083
  • 12
  • 84
  • 139