0

At the moment I am working on a time series project. I have Daily Data points over a 5 year timespan. In between there a some days with 0 values and some days are missing. For example:

2015-01-10  343
2015-03-10  128

Day 2 of october is missing. In order to build a good Time Series Model I want to resample the Data to Monthly:

df.individuals.resample("M").sum()

but I am getting the following output:

2015-01-31    343.000000
2015-02-28           NaN
2015-03-31     64.500000

Somehow the months are completely wrong.

The expected output would look like this:

2015-31-10  Sum of all days
2015-30-11  Sum of all days
2015-31-12  Sum of all days
  • 1
    This is not wrong. There is no data for February 2015 so the `mean` is `np.nan`. And resampling monthly `resample("M")` gives you month-end dates. What are you trying to get? Provide your expected output. – not_speshal Jul 09 '21 at 12:44
  • did you mean `df.individuals.resample("MS", loffset=pd.Timedelta(days=9)).mean()`? additionally, you can add `.interpolate()` to interpolate the missing data ([docs](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.interpolate.html)). – FObersteiner Jul 09 '21 at 12:59
  • Hello, i am trying to get the sum and the mean of the absolute numbers. But as u can see the sum of the the months are wrong (i double checked it in excel), and also my data starts at 2015-01-10 (oktober) and the output data start at janurary. – Mannheimer_Coder Jul 09 '21 at 13:03

1 Answers1

0

Pandas is interpreting your date as %Y-%m-%d. You should explicitly specify your date format before doing the resample. Try this:

df.index = pd.to_datetime(df.index, format="%Y-%d-%m")
>>> df.resample("M").sum()
2015-10-31  471
not_speshal
  • 22,093
  • 2
  • 15
  • 30