-2

I have time series data. Instead of using the first 80% of the data for training and the remaining 20% for testing, I want to split every month in that manner. The dataset contains multiple years of data. For every month I want to perform the split. Any knows how to this with for example the xgboost ml in python?

Herwini
  • 371
  • 1
  • 19

1 Answers1

0

You could try the stratify parameter in the train_test_split() function.

So something like: train_test_split(X, y, stratify=X['month_variable']).

That should give you a train and test split which is stratified with all the months in it.

If you want train and test sets which only contain a certain month, I would recommend you make different df's, so something like: df_jan = df.loc[df['month_variable'] == 'january] and then perform the train_test_split and build separate models.

Just depends on what you want.

Jem
  • 557
  • 9
  • 28