I've been running a dataset through Weka, applying NB. I stuck on the following problem: while I was analyzing it, I noticed the difference between total number in attributes section, and total instances appeared in log.
If you sum "a0" attribute, you'll notice Weka points 1044 instances. If you check "Instances", it is 1036.
Dataset, actually, contains 1036 instances.
Does anyone have a explanation about it? Thanks.
Here's a log paste:
=== Run information ===
Scheme: weka.classifiers.bayes.NaiveBayes
Relation: teste.carro
Instances: 1036
Attributes: 7
a0
a1
a2
a3
a4
a5
class
Test mode: evaluate on training data
=== Classifier model (full training set) ===
Naive Bayes Classifier
Class
Attribute 0 1
(0.5) (0.5)
===========================
a0
1 105.0 175.0
2 112.0 165.0
3 153.0 109.0
4 152.0 73.0
[total] 522.0 522.0
a1
1 101.0 165.0
2 123.0 165.0
3 136.0 119.0
4 162.0 73.0
[total] 522.0 522.0
a2
1 150.0 107.0
2 122.0 133.0
3 121.0 141.0
4 129.0 141.0
[total] 522.0 522.0
a3
1 247.0 1.0
2 134.0 265.0
3 140.0 255.0
[total] 521.0 521.0
a4
1 189.0 127.0
2 177.0 185.0
3 155.0 209.0
[total] 521.0 521.0
a5
1 244.0 1.0
2 160.0 220.0
3 117.0 300.0
[total] 521.0 521.0
Time taken to build model: 0 seconds
=== Evaluation on training set ===
Time taken to test model on training data: 0.01 seconds
=== Summary ===
Correctly Classified Instances 957 92.3745 %
Incorrectly Classified Instances 79 7.6255 %
Kappa statistic 0.8475
Mean absolute error 0.1564
Root mean squared error 0.2398
Relative absolute error 31.2731 %
Root relative squared error 47.9651 %
Coverage of cases (0.95 level) 100 %
Mean rel. region size (0.95 level) 80.2124 %
Total Number of Instances 1036
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0,847 0,000 1,000 0,847 0,917 0,858 0,989 0,991 0
1,000 0,153 0,868 1,000 0,929 0,858 0,989 0,988 1
Weighted Avg. 0,924 0,076 0,934 0,924 0,923 0,858 0,989 0,989
=== Confusion Matrix ===
a b <-- classified as
439 79 | a = 0
0 518 | b = 1