I have a dataframe df:
year | month | A
2020 | 1 | 4
2020 | 2 | 4
2020 | 3 | 6
2020 | 4 | 5
2020 | 5 | 5
I want to resample that to weekly and get the week of the year as a new column. Each week of a month should get an equal amount of the monthly value from column A.
df["day"] = 1
df["date"] = df[["year", "month", "day"]].astype(str).apply(lambda x: pd.to_datetime('-'.join(x)), 1)
df = df.set_index("date")
df = df.drop(["year", "month", "day"], axis=1)
Results in:
date | A
2020-01-01 | 4
2020-02-01 | 4
2020-03-01 | 6
2020-04-01 | 5
2020-05-01 | 5
Now I resample:
s = df.resample(rule="W").mean()
s = s.groupby(s["A"].notnull().cumsum()).["A"].transform(lambda x : x.sum()/len(x)).to_frame()
s = s.reset_index()
s["week_of_year"] = s["date"].dt.isocalendar().week
s = s.set_index("date")
Results in:
date | A | week_of_year
2020-01-03 | 0.8 | 53
2020-01-10 | 0.8 | 1
2020-01-17 | 0.8 | 2
2020-01-24 | 0.8 | 3
2020-01-31 | 0.8 | 4
2020-02-07 | 1 | 5
Question: Why does the first week start at 2020-01-03? That's a friday, not a monday. I understand the week_of_year = 53 because end of december 2019 is on a monday and that's ISO standard.
But shouldn't my next week then not start at 2020-01-06 == a monday?