0

I am running the [code] of multi-label classification1.how to fix the NameError that the "X_train" is not defined.the python code is given below.

import scipy
from scipy.io import arff
data, meta = scipy.io.arff.loadarff('./yeast/yeast-train.arff')
from sklearn.datasets import make_multilabel_classification

# this will generate a random multi-label dataset
X, y = make_multilabel_classification(sparse = True, n_labels = 20,
return_indicator = 'sparse', allow_unlabeled = False)

# using binary relevance
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.naive_bayes import GaussianNB

# initialize binary relevance multi-label classifier
# with a gaussian naive bayes base classifier
classifier = BinaryRelevance(GaussianNB())

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

from sklearn.metrics import accuracy_score
accuracy_score(y_test,predictions)
niedakh
  • 2,819
  • 2
  • 20
  • 19
abbas khan
  • 233
  • 2
  • 4
  • 7

2 Answers2

8

You forgot to split the dataset into train and test sets.

Import the library

from sklearn.model_selection import train_test_split

Add this line before classifier.fit()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
Mufeed
  • 3,018
  • 4
  • 20
  • 29
0

X_train does not exist, you have to split between train and test :

from sklearn.preprocessing import StandardScaler
s =StandardScaler()
X_train = s.fit_transform(X_train)
X_test = s.fit_transform(X_test)
endive1783
  • 827
  • 1
  • 8
  • 18