Since 2020 and up until today, we conduct marketing campaigns almost every Sunday, and I'm trying to calculate their impact using the XGBoost model and to calculate sundays without this campaign which is basically a high discount on most of the products.
However, I'm encountering a problem: XGBoost apparently learned that there is a campaign every sunday and as we tested some sundays without discount, the predicted values were significantly above the sales we observed. We sometimes change the type and value of the discount and have similar campaigns on other weekdays but not that frequently and the other days predictions are quite precise. So I think in general there should be data enough.
My question is: Is there a method or trick to better separate the effect of the campaigns/discounts get better results for sundays?
As additional information: I have splitted the date to following columns
- year
- month
- calendarweek
- day of year
- day of month
- day of week
And I added as lag feature
- y of -35 (5 weeks)
- y of -364 (one year but with matching day of weeks)
maybe the lag features are the problem because the old values are almost all the time with discount on sunday.