0

I implemented my own manual pca, then compared results with pca matlab function. I found an eigenvector differing in sign. I created feature vectors using the two most relevant eigenvectors I got from both methods, then used them to create two-dimentional representations of original data, and finally tried to reconstruct the original data back from the two-dim representations. The reconstructed results are pretty much similar. The error I calculated as sum of absult differences between original data and the reconstructed data is also pretty similar.

So, can I asume the sign difference I observed is irrelevant? In this case the sign difference affects only one of the eigenvectors, but I have observed several ocurrences of it when working whit other datasets. And which method would you recomend? Manual or matlab function? Why the difference?

Please, notice pca matlab function returns eigenvectors ordered by decreassing importance, while mine are odered in assending importance.

clear 
clc
format long g 


%   Get some data
load hald
rawData = ingredients;


%   Subtract the mean from the data; new data set must have zero-mean (but I actually got an aprox: e-16)
rawDataMean = mean(rawData);
[r, c] = size(rawData);
data = zeros(r, c);
for i = 1 : c
    data(:,i) = rawData(:,i) - rawDataMean(i);
end


%   Get the covariance matrix, (column variances along the diagonal)
C = cov(data);

%   Calcualte eigenvectors and eigenvalues of the covariance matrix.
%   Each column in eigenVector matrix is a vector
[eigenVectors, eigenValuesM] = eig(C);             
eigenValues = eig(C);                              


%   Order eigenvectores by eigenvalue, get the two most important as
%   feature vector
E = zeros (c + 1, c);
E(1,:) = eigenValues';
E(2:end, :) = eigenVectors;
E = E';
E = sortrows(E);
E = E';

firstImportance = E(2:end, c);
secondImportance = E(2:end, c-1);

featureVector = [firstImportance, secondImportance];

%   Get the new data set
finalData = (featureVector' * data')';

%   When comparing the manual PCA method applied above, E(2:end, :)  
%   contains the eigenvectors ordered by ASENDING importance (higher eigenvalue) 
%   Meanwhile, pca built-in function gives the eigenvectors ordered by
%   DESCENDING importance. So, the two first columns are the most important
%   Those matrix should match (right to left) but the second most important
%   eigenvector difers in sign, and the others match. Whyyy? 

coeff = pca(ingredients);
E(2:end, :);

%   If we try to get back the original data, that should identify the
%   right method.
%   We use data (the zero mean vertion of original data set) to get data 
%   adjusted by the main components identified by each method...

%eigenData1 = [E(2:end, c), E(2:end, c-1), E(2:end, c-2), E(2:end, c-3)]' * data';
%eigenData2 = coeff' * data';

eigenData1 = [E(2:end, c), E(2:end, c-1)]' * data';
eigenData2 = coeff(:, 1:2)' * data';


%   And then try to get original data back...
%originalData1 = [E(2:end, c), E(2:end, c-1), E(2:end, c-2), E(2:end, c-3)] * eigenData1;
%originalData2 = coeff * eigenData2;

originalData1 = [E(2:end, c), E(2:end, c-1)] * eigenData1;
originalData2 = coeff(:, 1:2) * eigenData2;

originalData1 = originalData1';
originalData2 = originalData2';

%   Then add the subtracted mean:
for i = 1 : c
    originalData1(:,i) = originalData1(:,i) + rawDataMean(i);
    originalData2(:,i) = originalData2(:,i) + rawDataMean(i);
end

%   Get an accumulated absolute measure of the difference between original 
%   raw data and the gotten back from eigenvectors...
error1 = sum(sum(abs(rawData - originalData1)))
error2 = sum(sum(abs(rawData - originalData2)))

%   With this test, error1 (the manual method error) is more litte than
%   the built in pca matlab function. But the difference is in the order of
%   e-13 when using all the eigenvactors, and the error becomes pretty much 
%   the same when only 1st and 2nd in importance are used.
min(error1, error2)
user3117891
  • 145
  • 2
  • 12
  • Short answer is yes. You can ignore the sign difference. Note that eigenvectors are **not** unique, as long as they come from the same eigenvalue. Any non-zero scaling you apply to each element in the eigenvector for a particular eigenvalue would be equivalent. That is if we multiply by a negative or scaling up each element, we're fine. Therefore, the sign difference is acceptable as long as we are comparing eigenvectors to the same eigenvalue. – rayryeng Sep 06 '17 at 17:19
  • 1
    I see, thank you so much! – user3117891 Sep 06 '17 at 17:27

0 Answers0