I'm new to sklearn and machine learning. I have a csv file containing the mappings of the following type: ID-2001-0001, ID-category_1 ID-2002 - 0002, ID-category_2 . . I have some 1010 unique ID's and 123 unique categories. Now, I wish to classify some 1000 other ID's. For the I want to train a classifier for 800/1010 already classified ID's. I'm using sklearn. With SVM I'm getting the same prediction for remaining 200 ID's. Using, GradientBoosting I'm getting aroung 1.4% accuracy. Is this because of the small data size?. Basically, I pass 100-dimensional vector and for ID-2001-0001 (word2vec) and its corresponding category to the fit method.
from sklearn.ensemble import GradientBoostingClassifier
clf = GradientBoostingClassifier()
clf = GradientBoostingClassifier()
clf.fit(IDVectorMatrix,categoryMatrix) #IDVectorMatrix is 100-dimensional matrix from pre-trained word2vec model. model['ID-2001-0001']
result = clf.predict(categoryTestingMatrix)
Am I doing this classification right?. Or Am I missing something?. Appreciate any help. Thanks