I'm trying to do weekly averages from meteorological time series data with several values per day. I found two ways of doing it, one with groupby and resample and the other one with groupby alone (based on this) but they give slightly different results. Can anyone tell me why??
Note: Sorry for the sample data, it's my first post in stackoverflow and I'm still figuring out how to do it. Any suggestions for improvement of this and future post are welcome.
station lon lat date time tmpf tmpc dwpf dwpc relh
0 IKV -93.5695 41.6878 2012-08-01 00:00 86.0 30.0 62.6 17.0 NaN
1 IKV -93.5695 41.6878 2012-08-01 00:10 86.0 30.0 62.6 17.0 45.61
2 IKV -93.5695 41.6878 2012-08-01 00:15 86.0 30.0 62.6 17.0 45.61
3 IKV -93.5695 41.6878 2012-08-01 00:20 86.0 30.0 64.4 18.0 48.58
4 IKV -93.5695 41.6878 2012-08-01 00:30 84.2 29.0 64.4 18.0 51.47
Option 1:
weeklydata_rs = df.groupby("station").resample('W-Sat', label='right', closed = 'right', on='date').mean().reset_index().sort_values(by='date')
Option 2:
weeklydata_gb = df.groupby(["date"], as_index=False).mean()
weeklydata_gb = weeklydata_gb.groupby([[i//7 for i in range(0,27)]],
axis = 0).mean()
Results examples:
weeklydata_rs
:
station date lon lat tmpf tmpc dwpf dwpc relh
1 IKV 2012-08-11 -93.5695 41.6878 70.130229 21.183461 57.311327 14.061849 67.087561
2 IKV 2012-08-18 -93.5695 41.6878 66.660856 19.256031 52.835175 11.575097 64.512218
weeklydata_gb
:
lon lat tmpf tmpc dwpf dwpc relh
0 -93.5695 41.6878 69.977555 21.098642 57.227445 14.015247 67.210785
1 -93.5695 41.6878 67.098785 19.499325 53.195740 11.775411 64.395461
Thanks!