0

I'm new to data mining and Weka. I built a classifier with J48 in Weka using the GUI, with J48 (training set) for an attribute of interest in five levels. I have to evaluate the precision of the model, but I don't know very well how to do it! Some information may be of interest:

== Detailed Accuracy By Class ===
Precision
0.80
?
0.67
0.56
?
?

First, I would like to know the meaning of the "?" in the precision column. When probing with an attribute of interest in two levels I got no "?". The tree is bigger now than when dividing into two levels. I am questioning if this means that taking an attribute of interest in five levels could generate a less efficient tree in terms of classification and computation time. This seems quite obvious as the number of Correctly Classified Instances when the attribute had 2 levels were up to 72%.

Thank you in advance, all interesting answers will be rewarded!

fina
  • 429
  • 4
  • 12

1 Answers1

1

"I would like to know the meaning of the "?" in the precision column"

Note that for these same classes the TP and FP rates are 0. It appears that J48 has not assigned any of your observations to these classes.

Are these classes relatively small? If so, you might want to consider using the ClassBalancer filter. This will use weights to make all classes look the same size.

Of course, after you get the model you need to "convert back" to the real situation. This is similar for correcting for physically oversampling or undersampling. See my answer here: https://stats.stackexchange.com/questions/211174/how-to-exact-prediction-from-over-sampled-dataundoing-oversampling/257507#257507

zbicyclist
  • 691
  • 5
  • 10
  • Thank you zbicyclist. I was wondering If you agree with this statement "the accuracy is smaller since the algorithm classifies less efficiently", so that having relatively small classes -due to a large division of the attribute- affects to J48 performance. – fina Apr 08 '19 at 11:38
  • 1
    Pretty much, except I wouldn't use the word "efficiently". The accuracy is smaller due to the larger number of classes. – zbicyclist Apr 09 '19 at 03:58
  • Agreed. It is probably better to talk about efficiency when referring to the computation time or the "Time it takes to build the model", which in this case is a bit longer. Thank you, – fina Apr 09 '19 at 07:55