I am trying to instantiate a naive Bayes classifier to classify text blocks (with a pre-defined classification). The example below just tries to do it with male/female. I have tried loading data from file (CSVloader) and by creating instances below. The problem is the trainer.train() method throws a null pointer exception. It seems to be because the targetDictionary is null. The data dictionary is populated. How do I force the targetDictionary to be populated on an instance?
My actual goal is to classify paper abstracts I have in a database as being 'Science, Politics, Law, Health etc. It appears the Bayes classifier is the right choice for this.
I've iterated over the loaded instanceList and it seems to be populated correctly, and the dataDictionary is populated, but TargetDictionary is null.
Using Mallet 2.0.8 on windows
public TestMallet() throws IOException {
ArrayList<Pipe> pipelist = new ArrayList<Pipe>();
pipelist.add (new CharSequenceLowercase() ) ;
pipelist.add (new CharSequence2TokenSequence(Pattern.compile("\\p{L}[\\p{L}\\p{P}]+\\p{L}")) ) ;
pipelist.add (new TokenSequenceRemoveStopwords (new File ("c:\\test\\config\\stopwords_en.txt"), "UTF-8", false, false, false) ) ;
pipelist.add (new TokenSequence2FeatureSequence()) ;
pipelist.add (new FeatureSequence2FeatureVector()) ; // Added but doesnt make any difference
InstanceList instances = new InstanceList (new SerialPipes(pipelist)) ;
Instance instance0 = new Instance("Hello World I am here and i am male my name is roger", "Male", "roger", "test") ;
Instance instance1 = new Instance("Hello World I am here and i am male my name is phil", "Male", "phil", "test") ;
Instance instance2 = new Instance("Hello World I am here and i am male my name is joe", "Male", "joe", "test") ;
Instance instance3 = new Instance("Hello World I am here and i am female my name is vira", "Female", "vira", "test") ;
Instance instance4 = new Instance("Hello World I am here and i am female my name is josie", "Female", "josie", "test") ;
instances.addThruPipe (instance0) ;
instances.addThruPipe (instance1) ;
instances.addThruPipe (instance2) ;
instances.addThruPipe (instance3) ;
instances.addThruPipe (instance4) ;
// Using Instance List to train
// ----------------------------
ClassifierTrainer trainer = new NaiveBayesTrainer();
trainer.train(instances);
// Null pointer exception here ( debugging, it looks like TargetDictionary is null)
}
Expecting trainer to analyse correctly.