0

I am trying to split my dataset into a train and a test set using scikit learn's stratified shuffle split, but it does not work because one of the classes has just one instances.

It would be okay if that one instance goes into either of train or test set. Is there any way I can achieve that?

1 Answers1

0

Stratified split except at least two instances of label to split dataset correctly.

You can duplicate the sample with unique label so that you can perform the split, fit them and ensure that the model is able to predict them.

I would do as follow:

vc = (df['y'].value_counts())
unique_label = vc[vc==1].index
df = pd.concat([df, df[df['y'].isin(unique_label)]])

NOTE: It might be wise to remove these sample as your model will have difficulty to learn and predict them.

Antoine Dubuis
  • 4,974
  • 1
  • 15
  • 29