2

I'm using WEKA to train a categorization Java program. There are initially several categories, let's say 10, and the system must work with those initial categories and start training. In order to do that...:

String [] categories = {"cat1", "cat2", ..., "cat10"};

public SomeClassifier(String[] categories) {

// Creates a FastVector of attributes.
FastVector attributes = new FastVector(3);

// Add attribute for holding property one.
attributes.addElement(new Attribute(P1_ATTRIBUTE, (FastVector) null));

// Add attribute for holding property two.
attributes.addElement(new Attribute(P2_ATTRIBUTE, (FastVector) null));

// Add values attribute.
FastVector values = new FastVector(categories.length);
for (int i = 0; i < categories.length; i++) {
    values.addElement(categories[i]);
}

attributes.addElement(new Attribute(CATEGORY_ATTRIBUTE, values));

// Create dataset with initial capacity of 25, and set index
Instances myInstances = new Instances(SOME_NAME, attributes, 25);
myInstances.setClassIndex(myInstances.numAttributes() - 1);
}

OK, now, time goes by and I want to add a new category to my training set (let's say, "cat11"), which is already being trained with some success. How can I accomplish this? WEKA documentation says "Once an attribute has been created, it can't be changed". So, maybe I can take out the Attribute from the Instances object, recreate the Attribute and then insert it again... or will that mess everything up?

Thanks in advance.

Azurlake
  • 612
  • 1
  • 6
  • 29

1 Answers1

0

OK, apparently, there is no way to do such thing using this implementation of Naïve Bayes. This is because when initializing the classifier, all categories appended to the classifier must sum 1, and when the classifier is being trained, new categories with probability != 0 would cause the classifier to behave in a strange manner having a sum > 1. Morevoer, the classifier may initialize its algorithm (calculation of conditioned probabilities and iterations) with the influence of the number of categories, and adding a new one after creation would mean to rebuild the algorithm in some way.

So, that leaves a question open... what classification mechanism can I use that allows me to introduce new categories over time?

Azurlake
  • 612
  • 1
  • 6
  • 29