I am trying to write a java program which calls CfsSubsetEval class in Weka to perform feature subset selection. CfsSubsetEval discretises the dataset, and I am trying to avoid that as the dataset is already discretized. The following are the lines from CfsSubsetEval.java that performs the discretization.
m_isNumeric = m_trainInstances.attribute(m_classIndex).isNumeric();
if (!m_isNumeric)
{
m_disTransform = new Discretize();
m_disTransform.setUseBetterEncoding(true);
m_disTransform.setInputFormat(m_trainInstances);
m_trainInstances = Filter.useFilter(m_trainInstances, m_disTransform);
}
Since the class attribute is defined in the arff file as follows:
@ATTRIBUTE class {true,false}
the attribute is not numeric, and hence the discretization is performed.
Although I have a little knowledge about Weka implementation, I tried to comment out these lines to skip the discretization. However, it did not work and the following exception is reported:
java.lang.ArrayIndexOutOfBoundsException: 1
at weka.attributeSelection.CfsSubsetEval.symmUncertCorr(CfsSubsetEval.java:515)
at weka.attributeSelection.CfsSubsetEval.correlate(CfsSubsetEval.java:445)
at weka.attributeSelection.CfsSubsetEval.evaluateSubset(CfsSubsetEval.java:392)
at weka.attributeSelection.BestFirst.search(BestFirst.java:806)
at weka.attributeSelection.AttributeSelection.SelectAttributes(AttributeSelection.java:606)
at selecting_features.runFeatureSelection.main(runFeatureSelection.java:39)
The question is: how can I change CfsSubsetEval.java so it does not discretise the dataset?
Your help is deeply appreciated.