0

I want to resample my dataframe including hourly precipitation values to daily (frequency of 24 hours) starting at a specific hour in the day (in my case it would start from 2020-02-01 06 UTC).

hourlydataframeimagefor2020-02-01:

enter image description here

I tried:

df = df.resample('24H',on='date').sum()

but this resulted in the sum of hourly precipitation from 2020-02-01 06 UTC to 2020-02-01 23 UTC, instead of a full 24 hours to 2020-02-02 05 UTC.

Is there an argument you can use to fix this issue? I tried origin = 'start', but that resulted in:

TypeError: resample() got an unexpected keyword argument 'origin'

Any guidance will be helpful, thank you!

Red
  • 26,798
  • 7
  • 36
  • 58

1 Answers1

2

You just need to upgrade pandas to use the origin argument:

Upgrade pandas

pip install --upgrade pandas

sample code

import pandas as pd

d = {'c1': range(60)}
df = pd.DataFrame(d)
df['date'] = pd.date_range('2020-02-01 06:00:00',
                                    periods=60,
                                    freq='H')


print(df.resample('24H', on='date', origin='start').sum())

print('sum of 1st 24: ', sum(range(24)))

Output

date                    
2020-02-01 06:00:00  276
2020-02-02 06:00:00  852
2020-02-03 06:00:00  642

sum of 1st 24:  276
Abhi_J
  • 2,061
  • 1
  • 4
  • 16