0

I have text classification data with predictions depending on categories, 'descriptions' and 'components'. I could do the classification using bag of words in python with scikit on 'descriptions'. But I want to get predictions using both categories in bag of words with weights to individual feature sets x = descriptions + 2* components How should I proceed?

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194
javi_p
  • 64
  • 7

1 Answers1

0

You can train individual classifiers for descriptions and merchants, and obtain a final score using score = w1 * predictions + w2 * components.

The values of w1 and w2 should be obtained using cross validation.

Alternatively, you can train a single multiclass classifier by combining the training dataset.

You will now have 4 classes:

  1. Neither 'predictions' nor 'components'
  2. 'predictions' but not 'components'
  3. not 'predictions' but 'components'
  4. 'predictions' and 'components'

And you can go ahead and train as usual.

axiom
  • 8,765
  • 3
  • 36
  • 38
  • Is there a way to combine the two categories in the bag of words model itself, instead of training classifiers separately? – javi_p Sep 30 '15 at 08:08