0

I'm trying to do weekly averages from meteorological time series data with several values per day. I found two ways of doing it, one with groupby and resample and the other one with groupby alone (based on this) but they give slightly different results. Can anyone tell me why??

Note: Sorry for the sample data, it's my first post in stackoverflow and I'm still figuring out how to do it. Any suggestions for improvement of this and future post are welcome.

station     lon     lat     date    time    tmpf    tmpc    dwpf    dwpc    relh
0   IKV     -93.5695    41.6878     2012-08-01  00:00   86.0    30.0    62.6    17.0    NaN
1   IKV     -93.5695    41.6878     2012-08-01  00:10   86.0    30.0    62.6    17.0    45.61
2   IKV     -93.5695    41.6878     2012-08-01  00:15   86.0    30.0    62.6    17.0    45.61
3   IKV     -93.5695    41.6878     2012-08-01  00:20   86.0    30.0    64.4    18.0    48.58
4   IKV     -93.5695    41.6878     2012-08-01  00:30   84.2    29.0    64.4    18.0    51.47

Option 1:

weeklydata_rs = df.groupby("station").resample('W-Sat', label='right', closed = 'right', on='date').mean().reset_index().sort_values(by='date')

Option 2:

weeklydata_gb = df.groupby(["date"], as_index=False).mean()
weeklydata_gb = weeklydata_gb.groupby([[i//7 for i in range(0,27)]], 
                                axis = 0).mean()

Results examples:

weeklydata_rs:

station     date    lon     lat     tmpf    tmpc    dwpf    dwpc    relh
1   IKV     2012-08-11  -93.5695    41.6878     70.130229   21.183461   57.311327   14.061849   67.087561
2   IKV     2012-08-18  -93.5695    41.6878     66.660856   19.256031   52.835175   11.575097   64.512218

weeklydata_gb:

lon     lat     tmpf    tmpc    dwpf    dwpc    relh
0   -93.5695    41.6878     69.977555   21.098642   57.227445   14.015247   67.210785
1   -93.5695    41.6878     67.098785   19.499325   53.195740   11.775411   64.395461

Thanks!

Timus
  • 10,974
  • 5
  • 14
  • 28
irmavar
  • 1
  • 2
  • Does every day have the same number of observations? If not, then you'll have a difference due to the fact that the 2. option puts equal weight on every days mean, whereas the 1. option just looks at the whole data in the week, regardless of the days. – Timus Oct 19 '21 at 13:16
  • 1
    Hi! I'd just check that and indeed the number of observations per day is different (which I was not expecting). Thanks for your answer! – irmavar Oct 19 '21 at 13:30

0 Answers0