1

I'm looking at the Mallet source codes, and it seems that most of the classifier implementations (e.g naive bayes) didn't really take into account the feature selections even though the InstanceList class has a setFeatureSelection method.

Now I want to conduct some quick experiments with my datasets with feature selection involved. I am thinking, from a technical shortcut standpoint, I might get the lowest ranking features and set those values to 0 in the instance vectors. Is this equivalent in machine learning to feature selection in classifier training whereby they are not considered at all (if smoothing e.g laplace estimation is not involved)?

thank you

goh
  • 27,631
  • 28
  • 89
  • 151
  • Have you worked on feature Selection using information gain???? I need some help waiting for your reply.... – Ashish Dec 28 '13 at 07:15

1 Answers1

1

Yes, setting the feature value to zero will have the same effect as removing it from the feature vector, since MALLET has no notion of "missing features," only zero and nonzero feature values.

Using the FeatureSelection class isn't too painful, though. MALLET comes with several built-in classes that apply a "mask" under the hood based on RankedFeatureVector sublcasses. For example, to use information gain feature selection, you should just be able to do this:

FeatureSelection fs = FeatureSelection(new InfoGain(ilist), numFeatures);
ilist.setFeatureSelection(fs);

You can also implement your own RankedFeatureVector subclass (the API is here) for something more customized. To manually select features some other way, you can still do so by creating a feature mask as a BitSet that contains all the feature ids (from the Alphabet) that you want to use, e.g.:

java.util.BitSet featureMask = /* some code to pick your features */;
FeatureSelection fs = FeatureSelection(ilist.getAlphabet(), featureMask);
ilist.setFeatureSelection(fs);

In general, I recommend using FeatureSelection objects instead of destructively changing the instance data.

burr
  • 529
  • 5
  • 8
  • I would love to use the featureSelection. Thing is, I've taken a look at most of the classifier implementation codes, and it seems most of them does not take into account feature selection. For instance, running the NaiveBayes classifier with feature selection set would yield the same result as without feature selection. Well, thanks for your help but anyway I ended up extending the NaiveBayesTrainer class working with FeatureSelection. – goh Oct 14 '13 at 09:48