How to use train_test_split? Fix error n_samples = 0

Question

I'm trying to split the data I am working with into training and testing sets but I get the error that n_samples = 0 when I use the train_test_split function.

Here's my code:

X_train, X_test, y_train, y_test = model_selection.train_test_split(summary, labels, test_size=0.35)

summary and labels are lists and after converting them to arrays this is the shape I get:

(1248,)
(1248,)

They both have 1248 values. Can someone tell me why its not working? Thanks

Error Message:

With n_samples=0, test_size=0.35 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters

YOLO · Answer 1 · 2020-08-07T21:26:31.090

0

Works for me, check if this works for you:

from sklearn.model_selection import train_test_split
import numpy as np

# dummy examples
summary, labels = np.arange(0,1248), np.arange(0,1248)

X_train, X_test, y_train, y_test = train_test_split(summary, labels, test_size=0.35)

Test with string list

summary, labels = ["hello"]*1248, ["test"]*1248

edited Aug 07 '20 at 21:26

answered Aug 07 '20 at 21:19

YOLO

20,181
5
20
40

I added the `np.arrange(0,1248)` part to my code and got this error: cannot use a string pattern on a bytes-like object – John Aug 07 '20 at 21:22
The summary list contains text as I am trying to train a bag of words model. Hope this info helps. – John Aug 07 '20 at 21:23
@John check the edit, replace those two list with new string lists. It will still work. – YOLO Aug 07 '20 at 21:26

How to use train_test_split? Fix error n_samples = 0

1 Answers1