I have a data which include dates in sorted order.
I would like to split the given data to train and test set. However, I must to split the data in a way that the test have to be newer than the train set.
Please look at the given example:
Let's assume that we have data by dates:
1, 2, 3, ..., n.
The numbers from 1 to n represents the days.
I would like to split it to 20% from the data to be train set and 80% of the data to be test set.
Good results:
1) train set = 1, 2, 3, ..., 20
test set = 21, ..., 100
2) train set = 101, 102, ... 120
test set = 121, ... 200
My code:
train_size = 0.2
train_dataframe, test_dataframe = cross_validation.train_test_split(features_dataframe, train_size=train_size)
train_dataframe = train_dataframe.sort(["date"])
test_dataframe = test_dataframe.sort(["date"])
Does not work for me!
Any suggestions?