I have time series data. Instead of using the first 80% of the data for training and the remaining 20% for testing, I want to split every month in that manner. The dataset contains multiple years of data. For every month I want to perform the split. Any knows how to this with for example the xgboost ml in python?
Asked
Active
Viewed 331 times
1 Answers
0
You could try the stratify
parameter in the train_test_split()
function.
So something like: train_test_split(X, y, stratify=X['month_variable'])
.
That should give you a train and test split which is stratified with all the months in it.
If you want train and test sets which only contain a certain month, I would recommend you make different df's, so something like: df_jan = df.loc[df['month_variable'] == 'january]
and then perform the train_test_split
and build separate models.
Just depends on what you want.

Jem
- 557
- 9
- 28