-2

i'm newbie for using python in google colab, i have to finish my project classes now (classification image using KNN algorithm). please help me to fix this code. Thank you

# Importing the dataset
dataset = ('/content/dataset/Validation/')
X = dataset
y = dataset

# Cleaning up variables to prevent loading data multiple times (which may cause memory issue)
try:
   del X_train, y_train
   del X_test, y_test
   print('Clear previously loaded data.')
except:
   pass

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(dataset)

# As a sanity check, we print out the size of the training and test data.
print('Training data shape: ', X_train.shape)
print('Training labels shape: ', y_train.shape)
print('Test data shape: ', X_test.shape)
print('Test labels shape: ', y_test.shape)

i have to finish my project classes now (classification image using KNN algorithm)

  • Welcome to Stack Overflow. Please include the full traceback error. That said, are you certain that ```train-test_split(dataset)``` returns 4 values? – ewokx Jun 13 '23 at 08:57
  • 2
    Read the `train_test_split` [docs](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html) carefully. – DeepSpace Jun 13 '23 at 08:59
  • yes, but both `X` and `y` are just pointing to `dataset`. Instead, X should get the data and y should take the labels. Sometimes the last column is the y but we cannot assume that without having the column names and descriptions, which haven't been provided. – Sembei Norimaki Jun 13 '23 at 09:14
  • If you just want `dataset` to be split into test and train sets then simply use `X_train, X_test = train_test_split(dataset)` . You might want to look at the other options in the docs to change the size of the splits etc. – user19077881 Jun 13 '23 at 10:20

1 Answers1

0

train_test_split returns 2 sets for each dataset you pass to it. And you pass only 1 dataset to the function. If you want to get 4 sub-datasets, you need to pass 2 datasets to the function.

So, replace your line by:

X_test, y_train, y_test = train_test_split(X, y)

Note: X and y are the same in your code, it shouldn't be the case. X should contain all columns except the target values, y should contain a single column with these target values.

Moreover, you should take care of additional parameters, such as random_state to ensure reproducibility of your code, for instance. As said in other comments, read the doc!

S_Crespo
  • 305
  • 1
  • 9