0

I am doing multi-class classification shape of data is (299,6) and the shape of labels is (299,5). Here is the sample of data that i have

[[0.004873972,0.069813839,-0.470500136,2.285885634,0.5335,0.052915143],
[0.001698812,0.041216647,-0.01333925,2.507806584,0.2332,0.123463255],
[0.005954432,0.077164967,4.749752766,26.45721079,0.1663,0.186452725],
[0.001792197,0.042334345,-0.176201652,1.9656153,0.4001,0.087055596],
[0.001966929,0.044350068,0.182059972,1.610369693,0.55,0.29675874]]

Here is the labels for this data labels for data[[1,0,0,0,0],[0,0,0,1,0],[0,0,0,1,0],[0,0,1,0,0],[0,1,0,0,0]] stored in csv file.

I tried svm and logistic regression but gives me error ValueError: bad input shape (299, 5),the error is in the labels but how i can resolve this.

[sample dataset][1]
  [1]: https://i.stack.imgur.com/Wncqy.png
yatu
  • 86,083
  • 12
  • 84
  • 139

1 Answers1

0

You can run it as a standard classification task where you convert one-hot ending to labels and train SVM classifier, see sample code:

import numpy as np
from sklearn.svm import SVC

data = np.array([[0.004873972,0.069813839,-0.470500136,2.285885634,0.5335,0.052915143],
                 [0.001698812,0.041216647,-0.01333925,2.507806584,0.2332,0.123463255],
                 [0.005954432,0.077164967,4.749752766,26.45721079,0.1663,0.186452725],
                 [0.001792197,0.042334345,-0.176201652,1.9656153,0.4001,0.087055596],
                 [0.001966929,0.044350068,0.182059972,1.610369693,0.55,0.29675874]])
outputs = np.array([[1,0,0,0,0],[0,0,0,1,0],[0,0,0,1,0],[0,0,1,0,0],[0,1,0,0,0]])
labels = np.argmax(outputs, axis=0)

clf = SVC()
clf.fit(data, labels)
print(clf.score(data, labels))
# 0.6

For parameters tuning have a look at Hyperparameter Tuning the Random Forest in Python and Comparing randomized search and grid search for hyperparameter estimation

Jirka
  • 1,126
  • 6
  • 25
  • this code works thank you; but accuracy provided by this is very low. do you know any method that gives me good results for this type of data. – shaheen Jan 10 '19 at 10:38
  • this is just default configuration, you can run a hyper-parameter search and test several other classifiers, see [create_classif_search_train_export](https://github.com/Borda/pyImSegm/blob/master/imsegm/classification.py#L651) function which search parameters, train best classifier and export it to a file – Jirka Jan 10 '19 at 11:29
  • can you please guide me for the tuning of hyper-parameters? as i did not get the good results what i have to do to achieve good accuracy? – shaheen Jan 11 '19 at 11:29
  • have a look at [Hyperparameter Tuning the Random Forest in Python](https://towardsdatascience.com/hyperparameter-tuning-the-random-forest-in-python-using-scikit-learn-28d2aa77dd74) and [Comparing randomized search and grid search for hyperparameter estimation](https://scikit-learn.org/stable/auto_examples/model_selection/plot_randomized_search.html) – Jirka Jan 11 '19 at 13:39