-1

I'm trying to apply a multi-label classification. The shapes are:

x_train.shape
(3975, 3788)

y_train.shape
(3975, 66)

x_test.shape
(994, 3788)

y_test.shape
(994, 66)

When I try to train, it gives the following error:

ValueError: bad input shape (3975, 66)

Any way to solve that? Here is the code:

sgd = SGDClassifier()
lr = LogisticRegression(solver='lbfgs')
svc = LinearSVC

def j_score(y_true, y_pred):
  jaccard = np.minimum(y_true, y_pred).sum(axis =1)/np.maximum(y_true, y_pred).sum(axis =1)
  return jaccard.mean()*100

def print_score(y_pred, clf):
  print('Clf: ', clf.__class__.__name__)
  print('Jaccard score: {}'.format(j_score(y_test, y_pred)))
  print('----')

for classifier in [sgd, lr, svc]:
  clf = OneVsOneClassifier(classifier)
  clf.fit(x_train, y_train) #Here is the error indicator
  y_pred = clf.predict(x_test)
  print_score(y_pred, classifier)
  • have you first checked which classifier is failing? try printing out the classifier right before the line that is throwing an error, would narrow down by a third. – Tadhg McDonald-Jensen Sep 19 '21 at 20:51
  • @TadhgMcDonald-Jensen it gave this output `SGDClassifier(alpha=0.0001, average=False, class_weight=None, early_stopping=False, epsilon=0.1, eta0=0.0, fit_intercept=True, l1_ratio=0.15, learning_rate='optimal', loss='hinge', max_iter=1000, n_iter_no_change=5, n_jobs=None, penalty='l2', power_t=0.5, random_state=None, shuffle=True, tol=0.001, validation_fraction=0.1, verbose=0, warm_start=False)` – hala mansour Sep 19 '21 at 20:53
  • well according to the [documentation of `SGDClassifier`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html#sklearn.linear_model.SGDClassifier.fit) it expects the shape of `Y` to be linear? you want 66 different outputs for each sample? – Tadhg McDonald-Jensen Sep 19 '21 at 20:58
  • @TadhgMcDonald-Jensen I didn't understand your question clearly, but the labels/targets are 66 columns. I'm trying to apply classification to know every feature how many labels has? This is a link of the colab if that will make things more clear [https://colab.research.google.com/drive/1S-T-1W_5MiW8Bo1sQQ0jyl-y_T-2hxEO#scrollTo=G7x85e97Quzn] – hala mansour Sep 19 '21 at 21:01
  • I haven't used sklearn before but I know how to read documentation and `X shape (n_samples, n_features), Y shape (n_samples,)` suggests it wants y to be only 1D. Hopefully someone who understands what is going on can help you because that is about the extent of my understanding. – Tadhg McDonald-Jensen Sep 19 '21 at 21:05
  • 1
    Neither `SGDClassifier` nor `LogisticRegression` support multi-label classification; for an explicit list of the scikit-learn algorithms that can do so, see the relevant [documentation](https://scikit-learn.org/stable/modules/multiclass.html). – desertnaut Sep 20 '21 at 10:38
  • I’m voting to close this question because it is about a non-issue: the used algorithms do not support multi-label classification, as clearly indicated in the relevant [documentation](https://scikit-learn.org/stable/modules/multiclass.html). – desertnaut Sep 20 '21 at 10:41

1 Answers1

0

The models you are using are doing binary classification. i.e. they can separate 2 classes of things; items that belong to one class, provided by a vector Y with one column only. This vector will contain class names such as class1 and class2.

If there are more classes then the vector Y will be one column with class1 ... classn. And then you can use a strategy like OneVsOneClassifier or OVR that will run the binary classification algorithm for each class_i to discriminate it from the rest. This is multiclass classification.

If you want to predict more than one output (i.e. Y has more than 1 column) then this problem is multilabel. For this you will need to use models that support multilabel. TreeClassifier is one for example, but SGD and logistic regression are not.

If your labels are not correlated, you can also try to run exactly the same code, but each rime provide one column of Y only.

To understand the difference and see what models support what, please look at https://scikit-learn.org/stable/modules/multiclass.html

Bluexm
  • 157
  • 6