0

I run

Y_testing_obtained = classify(X_testing, X_training, Y_training);

and the error I get is

Error using ==> classify at 246
The pooled covariance matrix of TRAINING must be positive definite.

X_training is 1550 x 5 matrix. Can you please tell me what this error means, i.e. why is it appearing, and how to work around it? Thanks

Paul R
  • 208,748
  • 37
  • 389
  • 560
Yebo
  • 1
  • 1
  • 2
  • Mention the dimensions of other variables too. X_testing, X_training need to have same number of columns, X_training, Y_training need to have same number of rows. – Pavan Yalamanchili May 07 '11 at 22:48
  • Classify is creating a covariance matrix based on the values you provide. The key is X_training and Y_training and those must be properly set. X_training builds the covariance matrix, so make sure it is correct before worrying about X_testing. – Rasman May 08 '11 at 19:39
  • If you have NaN values in your training data matrix it can produce a positive definite error – BGreene Oct 19 '12 at 09:07

2 Answers2

6

Explanation: When you run the function classify without specifying the type of discriminant function (as you did), Matlab uses Linear Discriminant Analysis (LDA). Without going into too much details on LDA, the algorithms needs to calculate the covariance matrix of X_testing in order to solve an optimisation problem, and this matrix has to be positive definite (see Wikipedia: Positive-definite matrix). The underlying assumption is that your data is represented by a multivariate probability distribution, which always has a positive definite covariance matrix unless one or more variables are exact linear combinations of the others.

To solve your problem: It is possible that one of your variables is a linear combination of the others. You can try selecting a sensible subset of your variables, or perform Principal Component Analysis (PCA) on the training data and then classify using the first few principal components. Or, you could specify the type of discriminant function and choose one of the two naive Bayes classifiers, for example:

Y_testing_obtained = classify(X_testing, X_training, Y_training, 'diaglinear');

As a side note, you also need to have more observations (rows) than variables (columns), but in your case this is not the problem as you seem to have 1550 observations and 5 variables.

Finally, you can also have a look at the answers posted to a similar question on the Matlab forum.

Melissa
  • 736
  • 6
  • 8
  • Thanks! Just started learning classification and 'diaglinear' solved my problem, much appreciated for the explanation! – Austin Sep 16 '14 at 18:32
0

Try regularizing the data using cvshrink function in Matlab

Spandyie
  • 914
  • 2
  • 11
  • 23