How to split data by using train_test_split in Python Numpy into train, test and validation data set? The split should not random

Question

I want to split data category wise into train, test and validation set. For example: if we have 3 categories positive, negative and neutral in the dataset. The positive category split into train, test, and validation. And the same with the other two categories. The splitting ratio is 80% of the data is for training and 20% for testing. From 80% of the training data, split 10% for the validation data. But the most important the split data should not random.

score 0 · Answer 1 · answered Nov 21 '19 at 08:19

0

You can use the stratify parameter to do this:

For example: If you were to use Iris dataset to do this.

from sklearn import cross_validation, datasets 

X = iris.data[:,:2]
y = iris.target

cross_validation.train_test_split(X,y,stratify=y)

You can read more here: https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html

answered Nov 21 '19 at 08:19

Sharath

216
2
11

It doesnt matter. The stratify variable takes care of this. It can have any number of classes. – Sharath Nov 21 '19 at 12:02
Welcome. Please do mark it as answer so that others may get the benefit as well – Sharath Nov 21 '19 at 14:31

How to split data by using train_test_split in Python Numpy into train, test and validation data set? The split should not random

1 Answers1