0

I have a dataframe df:

year | month | A
2020 | 1     | 4
2020 | 2     | 4
2020 | 3     | 6
2020 | 4     | 5
2020 | 5     | 5

I want to resample that to weekly and get the week of the year as a new column. Each week of a month should get an equal amount of the monthly value from column A.

df["day"] = 1
df["date"] = df[["year", "month", "day"]].astype(str).apply(lambda x: pd.to_datetime('-'.join(x)), 1)
df = df.set_index("date")
df = df.drop(["year", "month", "day"], axis=1)

Results in:

date       | A
2020-01-01 | 4
2020-02-01 | 4
2020-03-01 | 6
2020-04-01 | 5
2020-05-01 | 5

Now I resample:

s = df.resample(rule="W").mean()
s = s.groupby(s["A"].notnull().cumsum()).["A"].transform(lambda x : x.sum()/len(x)).to_frame()
s = s.reset_index()

s["week_of_year"] = s["date"].dt.isocalendar().week
s = s.set_index("date")

Results in:

date       | A   | week_of_year
2020-01-03 | 0.8 | 53
2020-01-10 | 0.8 | 1
2020-01-17 | 0.8 | 2
2020-01-24 | 0.8 | 3 
2020-01-31 | 0.8 | 4
2020-02-07 | 1   | 5

Question: Why does the first week start at 2020-01-03? That's a friday, not a monday. I understand the week_of_year = 53 because end of december 2019 is on a monday and that's ISO standard.

But shouldn't my next week then not start at 2020-01-06 == a monday?

Vega
  • 2,661
  • 5
  • 24
  • 49
  • I tried to replicate your problem, but I can't. The first date for me when I use `s = df.resample(rule="W").mean()` is `2020-01-05`, so everything goes well (assuming first day of week sunday) – Let's try Aug 06 '20 at 10:30
  • Why should the first day of the week be sunday? Isn't it monday, at least in EU/US? – Vega Aug 06 '20 at 10:56
  • Check [here](https://stackoverflow.com/questions/14530556/resample-time-series-in-pandas-to-a-weekly-interval), you have to use `ts.resample('W-MON')` – Let's try Aug 06 '20 at 11:37
  • .resample("W-MON") means that the week ends with monday, not starts with it according to your link? – Vega Aug 07 '20 at 08:04
  • hmm true that, I read it quickly. Nevertheless there are more options there like `ts.resample('D').interpolate()[::7]` and the link to a github [issue](https://github.com/pandas-dev/pandas/issues/16381) that maybe can help you. – Let's try Aug 07 '20 at 09:18

0 Answers0