1

I have a simple matlab code to generate some random data and then to use a Euclidean and Mahalanobis classifier to classify the random data. The issue I am having is that the error results for each classifier is always the same. They both always misclassify the same vectors. But the data is different each time.

So the data is created in a simple way to check the results easily. Because we have three classes all of which are equiprobable, I just generate 333 random values for each class and add them all to X to be classified. Thus the results should be [class 1, class 2, class 3] but 333 of each.

I can tell the classifiers work because I can view the data created by mvnrnd is random each time and the error changes. But between the two classifiers the error does not change.

Can anyone tell why?

% Create some initial values, means, covariance matrix, etc
c = 3;
P = 1/c; % All 3 classes are equiprobable
N = 999;
m1 = [1, 1];
m2 = [12, 8];
m3 = [16, 1];
m = [m1; m2; m3];
S = [4 0; 0 4];    % All share the same covar matrix

% Generate random data for each class
X1 = mvnrnd(m1, S, N*P);
X2 = mvnrnd(m2, S, N*P);
X3 = mvnrnd(m3, S, N*P);
X = [X1; X2; X3];

% Create the solution array zEst to compare results to
xEst = ceil((3/999:3/999:3));

% Do the actual classification for mahalanobis and euclidean
zEuc = euc_mal_classifier(m', S, P, X', c, N, true);
zMal = euc_mal_classifier(m', S, P, X', c, N, false);

% Check the results
numEucErr = 0;
numMalErr = 0;
for i=1:N
    if(zEuc(i) ~= xEst(i))
        numEucErr = numEucErr + 1;
    end
    if(zMal(i) ~= xEst(i))
        numMalErr = numMalErr + 1;
    end
end

% Tell the user the results of the  classification
strE = ['Euclidean classifier error percent: ', num2str((numEucErr/N) * 100)];
strM = ['Mahalanob classifier error percent: ', num2str((numMalErr/N) * 100)];
disp(strE);
disp(strM);

And the classifier

function z = euc_mal_classifier( m, S, P, X, c, N, eOrM)
  for i=1:N
      for j=1:c
          if(eOrM == true)
              t(j) = sqrt((X(:,i)- m(:,j))'*(X(:,i)-m(:,j)));
          else
              t(j) = sqrt((X(:,i)- m(:,j))'*inv(S)*(X(:,i)-m(:,j)));
          end
      end
      [num, z(i)] = min(t);
  end
Andre Silva
  • 4,782
  • 9
  • 52
  • 65
KDecker
  • 6,928
  • 8
  • 40
  • 81

2 Answers2

1

The reason why there is no difference in classification lies in your covariance matrix.

Assume the distance of a point to the center of a class is [x,y].

For euclidian the distance then will be:

sqrt(x*x + y*y);

For Mahalanobis:

Inverse of covariance matrix:

inv([a,0;0,a]) = [1/a,0;0,1/a]

Distance is then:

sqrt(x*x*1/a + y*y*1/a) = 1/sqrt(a)* sqrt(x*x + y*y)

So, the distances for the classes will be the same as euclidean but with a scale factor. Since the scale factor is the same for all classes and dimensions, you will not find a difference in your class assignments!

Test it with different covariance matrices and you will find your errors to differ.

Steffen
  • 2,381
  • 4
  • 20
  • 33
0

Because of this kind of data with identity covariance matrix, all classifiers should result in almost the same performance let's see the data without identity covariance matrix that three classifiers lead to different errors :

err_bayesian =

0.0861

err_euclidean =

0.1331

err_mahalanobis =

0.0871
close('all');clear;

% Generate and plot dataset X1
m1=[1, 1]'; m2=[10, 5]';m3=[11, 1]';
m=[m1 m2 m3];

S1 = [7 4 ; 4 5];
S(:,:,1)=S1;
S(:,:,2)=S1;
S(:,:,3)=S1;

P=[1/3 1/3 1/3];
N=1000;
randn('seed',0);
[X,y]   =generate_gauss_classes(m,S,P,N);
plot_data(X,y,m,1);

randn('seed',200);
[X4,y1] =generate_gauss_classes(m,S,P,N);


% 2.5_b.1 Applying Bayesian classifier
z_bayesian=bayes_classifier(m,S,P,X4);

% 2.5_b.2 Apply ML estimates of the mean values and covariance matrix (common to all three
% classes) using function Gaussian_ML_estimate
class1_data=X(:,find(y==1));
[m1_hat, S1_hat]=Gaussian_ML_estimate(class1_data);
class2_data=X(:,find(y==2));
[m2_hat, S2_hat]=Gaussian_ML_estimate(class2_data);
class3_data=X(:,find(y==3));
[m3_hat, S3_hat]=Gaussian_ML_estimate(class3_data);
S_hat=(1/3)*(S1_hat+S2_hat+S3_hat);
m_hat=[m1_hat m2_hat m3_hat];

% Apply the Euclidean distance classifier, using the ML estimates of the means, in order to
% classify the data vectors of X1
z_euclidean=euclidean_classifier(m_hat,X4);

% 2.5_b.3 Similarly, for the Mahalanobis distance classifier, we have
z_mahalanobis=mahalanobis_classifier(m_hat,S_hat,X4);



%  2.5_c. Compute the error probability for each classifier
err_bayesian = (1-length(find(y1==z_bayesian))/length(y1))
err_euclidean = (1-length(find(y1==z_euclidean))/length(y1))
err_mahalanobis = (1-length(find(y1==z_mahalanobis))/length(y1))