Working with pandas in Python.
My data looks like:
2018-03-16 5.0
2018-03-17 5.0
2018-03-18 5.0
...
2018-03-31 5.0
After using
resample('MS').mean()
I get the following result:
2018-03-01 5.000000
The correct result should be approximately 2.5 instead of 5. The resample method simply calculates the mean from the first given day within the month instead of taking into account the previous days where value was 0. This problem is even more absurd when e.g. calculating yearly mean you could have a time series starting on Dec 31th and resample would give x instead of x/365.
Surely, there must be a better solution than filling missing initial dates with zeroes? Is there a parameter that could take care of this problem?
I should add that I'm primarily interested in solutions that involve resample, e.g. if there is a simple way to fix this with setting up a parameter. If not, I will settle for a solution that does not involve the resample method. So, I am also open to suggestions using methods other than resample.