1

I got the following dataframe containing daily data and I would like to resample it to weekly data.

            Name Target Sales
Datetime
2021-12-01   Amy   1000  26000
2021-12-02   Amy   1000   0
...
2021-12-30   Amy   1000   0
2021-12-31   Amy   1000   0
2021-12-01   Zoe   1000  1680
2021-12-02   Zoe   1000   0
...
2021-12-30   Zoe   1000  19414
2021-12-31   Zoe   1000   0

I performed the following code, however, the result does not satisfy my need.

sum_dict = {'Target':'sum','Sales':'sum'}
df.groupby(['Name']).resample('W').apply(sum_dict )

And the result where the last index becomes Jan instead of Dec

            Name Target Sales
Datetime
2021-12-05   Amy   5000  35100
2021-12-12   Amy   7000  -8403.6
2021-12-19   Amy   7000  179176
2021-12-26   Amy   7000  50222
2022-01-02   Amy   5000  0
2021-12-05   Zoe   5000  1680
2021-12-12   Zoe   7000  57415
2021-12-19   Zoe   7000  80254
2021-12-26   Zoe   7000  75256.4
2022-01-02   Zoe   5000  43494

What I expect is the following:

            Name Target Sales
Datetime
2021-12-05   Amy   5000  35100
2021-12-12   Amy   7000  -8403.6
2021-12-19   Amy   7000  179176
2021-12-26   Amy   7000  50222
2021-12-31   Amy   5000  0
2021-12-05   Zoe   5000  1680
2021-12-12   Zoe   7000  57415
2021-12-19   Zoe   7000  80254
2021-12-26   Zoe   7000  75256.4
2021-12-31   Zoe   5000  43494

The end date is using the end date of that month instead of the last date of that week. Thanks!

Hang
  • 197
  • 1
  • 11
  • But the week `'2021-12-27/2022-01-02'` does end on `2022-01-02`... Do you want to have a broken period on purpose? – mozway Feb 03 '22 at 10:48
  • @mozway Yes, I would like to resample the daily data to weekly data. However, only resampling to the end of that month, and it won't use the next month as the end of that week – Hang Feb 03 '22 at 10:52

1 Answers1

0

Given the following dataframe:

import pandas as pd

df = pd.DataFrame(
    {
        "Datetime": {
            0: "2021-12-05",
            1: "2021-12-12",
            2: "2021-12-19",
            3: "2021-12-26",
            4: "2022-01-02",
            5: "2021-12-05",
            6: "2021-12-12",
            7: "2021-12-19",
            8: "2021-12-26",
            9: "2022-01-02",
        },
        "Name": {
            0: "Amy",
            1: "Amy",
            2: "Amy",
            3: "Amy",
            4: "Amy",
            5: "Zoe",
            6: "Zoe",
            7: "Zoe",
            8: "Zoe",
            9: "Zoe",
        },
        "Target": {
            0: 5000,
            1: 7000,
            2: 7000,
            3: 7000,
            4: 5000,
            5: 5000,
            6: 7000,
            7: 7000,
            8: 7000,
            9: 5000,
        },
        "Sales": {
            0: 35100.0,
            1: -8403.6,
            2: 179176.0,
            3: 50222.0,
            4: 0.0,
            5: 1680.0,
            6: 57415.0,
            7: 80254.0,
            8: 75256.4,
            9: 43494.0,
        },
    }
)

You could try like this:

from pandas.tseries.offsets import MonthEnd

# Convert "Datetime" column to pandas datetime objects
df["Datetime"] = pd.to_datetime(df["Datetime"], format="%Y-%m-%d")

# Find the dates to modify
mask = df["Datetime"].dt.month - df["Datetime"].shift(periods=1).dt.month < 0

# Apply date shift
df.loc[mask, "Datetime"] = df.loc[mask, "Datetime"].apply(lambda x: x - MonthEnd(1))

And so here the expected output:

print(df)
# Output
    Datetime Name  Target     Sales
0 2021-12-05  Amy    5000   35100.0
1 2021-12-12  Amy    7000   -8403.6
2 2021-12-19  Amy    7000  179176.0
3 2021-12-26  Amy    7000   50222.0
4 2021-12-31  Amy    5000       0.0
5 2021-12-05  Zoe    5000    1680.0
6 2021-12-12  Zoe    7000   57415.0
7 2021-12-19  Zoe    7000   80254.0
8 2021-12-26  Zoe    7000   75256.4
9 2021-12-31  Zoe    5000   43494.0
Laurent
  • 12,287
  • 7
  • 21
  • 37