I have the following data frame:
product_id | year | sales | freq |
---|---|---|---|
1 | 2023 | 100 | 2 |
1 | 2022 | 200 | 3 |
2 | 2023 | 10 | 2 |
3 | 2022 | 600 | 3 |
I want to add another column that calculates and adds mean values to the data frame based on the following formula.
if df["freq"] = x: df["mean_sales"] = the mean of 'x' rows below and 'x' rows above the current row where the product id is the same.
I want to sort the table first by product ID and year. Then group it by product ID and then calculate the mean based on the formula above.
Here is my pseudocode that is currently not working as expected:
import pandas as pd
data = {
"product_id": [1, 1, 2, 3],
"year": [2023, 2022, 2023, 2022],
"sales": [100, 200, 10, 600],
"freq": [2, 3, 2, 3],
}
df = pd.DataFrame(data)
unique_frequencies = df["freq"].unique().tolist()
for frequency in unique_frequencies:
df_tmp = df[[df["freq"] == frequency]].sort_values(by=["product_id","year"], ascending=True).groupby("product_id")
df_tmp["mean_sales"] = df_tmp["sales"].rolling(window_size=frequency, closed="left").mean()
df_results = pd.concat([df_results, df_tmp])
I get the following error while trying this operation: TypeError: 'DataFrameGroupBy' object does not support item assignment.
I would really appreciate if someone can help me solve this. Alternate methods are also welcome.