I am trying to use XGBoost (in Python) to predict advertising campaign revenue based on several features, including the day the campaign is active, number of installs, and the spend. What I would like to do is to configure XGBoost in such a way that e.g. 3 most recent days (for the campaigns that have been active for longer than 3 days) are weighted more in the prediction of the revenue. I would think that somehow DMatrix has to be involved, yet maybe also the dataframe I am working with might also have to be transformed. Currently, the data I use has the following format (and a random example):
campaign | day of activity | installs | spend | revenue |
---|---|---|---|---|
A | 1 | 24 | 230 | 50 |
A | 2 | 36 | 230 | 62 |
A | 3 | 48 | 235 | 77 |
A | 4 | 49 | 235 | 79 |
C | 1 | 2 | 100 | 13 |
C | 2 | 6 | 100 | 14 |
C | 3 | 7 | 105 | 16 |
so I am assigning the revenue to y, the rest to X, setting the train/test and then configuring the XGBRegressor parameters.
That said, I would appreciate the help on
- determing what approach to use to have the weighting as described below
- at what point to the weighting has to be introduces (in relation to assigning variables, train/test split, setting XGB params)