-1

I have a dataset that contains 'text' column are the articles, and 66 labels/target. And I want to apply a multilabel classification to classify to which labels the articles related. I assigned them in this shape.

x_train.shape
(3975, 3788)

y_train.shape
(3975, 66)

x_test.shape
(994, 3788)

y_test.shape
(994, 66)

When I run for classifier loop, I get this error

`ValueError: bad input shape (3975, 66)`

Here is the code:

sgd = SGDClassifier()
lr = LogisticRegression(solver='lbfgs')
svc = LinearSVC

def j_score(y_true, y_pred):
  jaccard = np.minimum(y_true, y_pred).sum(axis =1)/np.maximum(y_true, y_pred).sum(axis =1)
  return jaccard.mean()*100

def print_score(y_pred, clf):
  print('Clf: ', clf.__class__.__name__)
  print('Jaccard score: {}'.format(j_score(y_test, y_pred)))
  print('----')

for classifier in [sgd, lr, svc]:
  clf = OneVsOneClassifier(classifier)
  clf.fit(x_train, y_train)
  y_pred = clf.predict(x_test)
  print_score(y_pred, classifier)
desertnaut
  • 57,590
  • 26
  • 140
  • 166

1 Answers1

0

It seems like you mixed up multi-label with multi-class. The OneVsOneClassifier method is one way to run multi-class classification. To do multi-label you first need to use a model that supports it. This link will help you https://scikit-learn.org/stable/modules/multiclass.html

Bluexm
  • 157
  • 6