-1

I'm trying to split the data I am working with into training and testing sets but I get the error that n_samples = 0 when I use the train_test_split function.

Here's my code:

X_train, X_test, y_train, y_test = model_selection.train_test_split(summary, labels, test_size=0.35)

summary and labels are lists and after converting them to arrays this is the shape I get:

(1248,)
(1248,)

They both have 1248 values. Can someone tell me why its not working? Thanks

Error Message:

With n_samples=0, test_size=0.35 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters
John
  • 1
  • 1

1 Answers1

0

Works for me, check if this works for you:

from sklearn.model_selection import train_test_split
import numpy as np

# dummy examples
summary, labels = np.arange(0,1248), np.arange(0,1248)

X_train, X_test, y_train, y_test = train_test_split(summary, labels, test_size=0.35)

Test with string list

summary, labels = ["hello"]*1248, ["test"]*1248
YOLO
  • 20,181
  • 5
  • 20
  • 40
  • I added the `np.arrange(0,1248)` part to my code and got this error: cannot use a string pattern on a bytes-like object – John Aug 07 '20 at 21:22
  • The summary list contains text as I am trying to train a bag of words model. Hope this info helps. – John Aug 07 '20 at 21:23
  • @John check the edit, replace those two list with new string lists. It will still work. – YOLO Aug 07 '20 at 21:26