Multi-label classification sklearn

Asked Mar 10 '17 at 01:18

Active Mar 10 '17 at 01:18

Viewed 376 times

I'm new to sklearn and machine learning. I have a csv file containing the mappings of the following type: ID-2001-0001, ID-category_1 ID-2002 - 0002, ID-category_2 . . I have some 1010 unique ID's and 123 unique categories. Now, I wish to classify some 1000 other ID's. For the I want to train a classifier for 800/1010 already classified ID's. I'm using sklearn. With SVM I'm getting the same prediction for remaining 200 ID's. Using, GradientBoosting I'm getting aroung 1.4% accuracy. Is this because of the small data size?. Basically, I pass 100-dimensional vector and for ID-2001-0001 (word2vec) and its corresponding category to the fit method.

from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf = GradientBoostingClassifier()
clf.fit(IDVectorMatrix,categoryMatrix)     #IDVectorMatrix is 100-dimensional matrix from pre-trained word2vec model. model['ID-2001-0001']
result = clf.predict(categoryTestingMatrix)

Am I doing this classification right?. Or Am I missing something?. Appreciate any help. Thanks

asked Mar 10 '17 at 01:18

CodeSsscala

1

Need to describe more about data, code. – Vivek Kumar Mar 10 '17 at 04:59
Also please provide info, if one sample can have more than one class assigned, in other words if it is multilabel or multiclass classification. – PeterB Mar 15 '17 at 20:13

Multi-label classification sklearn

0 Answers0