0

I have trained a MultiClassClassifier (tested, working) and saved it somewhere on my hard drive. Now I want to make predictions for a new sample I got. I load my application and my classifier auto loads with it. I have narrowed down the search to five 5 possible classes already for the sample, outside the classification process. This means, I know k classes, that can easily be avoided in the classification.

Is it possible to filter a MultiClassClassifier (filter out all unwanted classes) before using it?

If it is? What would be the Weka method to work with for this purpose? If not, is there an alter. solution?

I want to increase the accuracy of the classifier by narrowing down the focus on 5 classes out of n classes.

I've found how to filter Instances objects but can't seem to find a proper method for the MultiClassClassifer.

My data to manipulate with is/are my testing Instances and my MultiClassClassifier.

Thank You in advance.

c00ki3s
  • 466
  • 9
  • 19

1 Answers1

1

There isn't really a way to modify an existing MultiClassClassifier to exclude classes. However, depending on the underlying classifier you're using, you could try using .distributionForInstance which outputs a vector of confidence scores, one per class. You could then take the class with the highest score, ignoring the scores for the classes not in your target set.

nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • Thank You for the fast response. I'm using the MultiClassClassifier with the option SMO and fitting of logistic models (-M) enabled. – c00ki3s Mar 09 '16 at 20:38
  • It should work then - I've used distributionForInstance with SMO MultiClassClassifier before and it works. Give it a try. – nneonneo Mar 09 '16 at 20:47
  • Will definitely try it out then. Will report back asap. Thank You. – c00ki3s Mar 09 '16 at 21:00
  • I've coded it and it looks really dirty. Ignoring distributions, especially cases where numbers are distributed equally between `k` classes is kinda nasty. For example, I've got a sample with 20% distribution to a correctly classified class. Such a low distribution really shows doubt on the classification procedure. There should really be a method to filter a classifier so `distributionForInstance` method could be used correctly in these cases. Thank You again for your time and explanation. Will keep an eye out for more solutions. Hopefully one day, there will be one to solve this mess. – c00ki3s Mar 10 '16 at 00:44