I am trying to resample my df, however using the grouper function several values are being left out in the resampling process: I have some hierarchical data from 2003 to 2020 which bottoms out into time series data which looks something like this:
polar_temp
Station_Number Date Value
417 CA002100805 20030101 -296
423 CA002202570 20030101 -269
425 CA002203058 20030101 -268
427 CA002300551 20030101 -23
428 CA002300902 20030101 -200
I set a multi index on Station_Number and Date:
polar_temp['Date'] = pd.to_datetime(polar_temp['Date'],
format='%Y%m%d')#.dt.strftime("%Y-%m-%d")
polar_temp = polar_temp.set_index(['Station_Number', "Date"])
Value
Station_Number Date
CA002100805 2003-01-01 -296
CA002202570 2003-01-01 -269
CA002203058 2003-01-01 -268
CA002300551 2003-01-01 -23
CA002300902 2003-01-01 -200
Now I would like to perform a resampling of the data by calculating the mean of Value for every 8 days by using:
polar_temp8d = polar_temp.groupby([pd.Grouper(level='Station_Number'),
pd.Grouper(level='Date', freq='8D')]).mean()
Value
Station_Number Date
CA002100805 2003-01-01 -300.285714
2003-01-09 -328.750000
2003-01-17 -325.500000
2003-01-25 -385.833333
2003-02-02 -194.428571
... ...
USW00027515 2005-06-23 76.625000
2005-07-01 42.375000
2005-07-09 94.500000
2005-07-17 66.500000
2005-07-25 56.285714
The problem is that there are only approx. around 60.000 values being returned, however the input df has around 1 Million values. I have tried the same procedure for only the years 2003 to 2011 and again only got a return of approx. 60.000. Thus my questions:
- Did I use the grouper function wrong?
- Is the problem perhaps due to leap years?
- Or is there another way to resample the data?