1

I get this error:

Warning: TRAINING can only contain
non-negative integers when
'Distribution' is set to 'mn'. Rows of
TRAINING with invalid values will be
removed. 
> In NaiveBayes.fit at 317
??? Error using ==>

??? Error using ==>
NaiveBayes.fit>mnfit at 647
At least one valid observation in each
class is required.

Error in ==> NaiveBayes.fit at 496
            obj =  mnfit(obj,training,
            gindex);

This is what I have:

training_data = Testdata; 
target_class = TestDataLabels;

%# train model
nb = NaiveBayes.fit(training_data, target_class, 'Distribution', 'mn');

%# prediction
class1 = nb.predict(UnseenTestdata); 

%# performance
cmat1 = confusionmat(UnseenTestDataLabels, class1);
acc1 = 100*sum(diag(cmat1))./sum(cmat1(:));
fprintf('Classifier1:\naccuracy = %.2f%%\n', acc1);
fprintf('Confusion Matrix:\n'), disp(cmat1)

The dataset is 4940201x42 if anyone is wondering.

G Gr
  • 6,030
  • 20
  • 91
  • 184

1 Answers1

1

You've got two problems.

First, for multinomial distributions, MATLAB wants your data to have non-negative integer values. And second, it seems like for at least some of your classes, you don't have any valid observations. This might be because of NAN's, INF's, or just non-positive values in the rows of Testdata.

Actually, as the error says - "invalid rows will be removed"... so I bet invalid rows were removed...

Pete
  • 2,336
  • 2
  • 16
  • 23
  • Does that mean anything with a decimal as I cant see any negative numbers, if they are removed why throw the second error? – G Gr Nov 21 '12 at 16:20
  • "Integer" usually means without a decimal, yes :) . The second error is (probably) because you no longer have examples from one of the classes (unique values of target_class), so there's nothing for the classifier to do. – Pete Nov 21 '12 at 16:23
  • ahh ok, what do you think my options are, round up and down or remove? – G Gr Nov 21 '12 at 16:28
  • That depends on your data, application, etc. If your data is not multi-nomial, why are you using a Multi-nomial distribution? You can try specifying a different distribution. See the help: help NaiveBayes.fit – Pete Nov 21 '12 at 16:31
  • I had changed to Multi-Nomial because of this post [here](http://stackoverflow.com/questions/13427664/the-within-class-variance-in-each-feature-of-training-must-be-positive), it solved my problem but due to bad prediction I wanted to add in more data so I re-added some of the extra coloumns. – G Gr Nov 21 '12 at 16:38
  • If your data is of mixed type (multi-nomial, gaussian, etc) then MATLAB's NaiveBayes is probably not going to work for you. – Pete Nov 21 '12 at 16:45
  • Well I had been at 81% accuracy adding in the extra rows worked by increasing a hole 2%, it does work just not that great. – G Gr Nov 21 '12 at 17:26
  • Thanks for the accept; since your data appears to be difficult to describe, consider using a random forest, for good results: http://en.wikipedia.org/wiki/Random_forest – Pete Nov 21 '12 at 17:28