I have text classification data with predictions depending on categories, 'descriptions' and 'components'. I could do the classification using bag of words in python with scikit on 'descriptions'. But I want to get predictions using both categories in bag of words with weights to individual feature sets x = descriptions + 2* components How should I proceed?
Asked
Active
Viewed 769 times
0

Has QUIT--Anony-Mousse
- 76,138
- 12
- 138
- 194

javi_p
- 64
- 7
-
You can concatenate feature sets, and you can put weights on them, too. – Has QUIT--Anony-Mousse Sep 30 '15 at 17:45
1 Answers
0
You can train individual classifiers for descriptions and merchants, and obtain a final score using score = w1 * predictions + w2 * components.
The values of w1
and w2
should be obtained using cross validation.
Alternatively, you can train a single multiclass classifier by combining the training dataset.
You will now have 4 classes:
- Neither 'predictions' nor 'components'
- 'predictions' but not 'components'
- not 'predictions' but 'components'
- 'predictions' and 'components'
And you can go ahead and train as usual.

axiom
- 8,765
- 3
- 36
- 38
-
Is there a way to combine the two categories in the bag of words model itself, instead of training classifiers separately? – javi_p Sep 30 '15 at 08:08