At the beginning, i had 400,000 images that were normalized (gray value increase). After that i did a DFT of each picture and got data of 400000 samples with 3200 absolute fourier-coefficients.
Now I would like to do a PCA and SVD. Since my data is already normalized and all values have the same units, I thought that I could use the "raw data" for the calculation. However, the eigenvalues of PCA and the singular values of SVD are different. (show image/link)
What am I doing wrong? How should the data be available for PCA and SVD? normalized,standardized, raw?
Please help me! Thank you
My Code:
# samples 400000x3200
# SVD
U,S,VT = svd(samples, full_matrices=False)
tot_S = sum(S)
var_exp_S = [(i / tot_S) for i in S]
cum_var_exp_S = np.cumsum(var_exp_S)
# PCA
cov_mat = np.cov(samples.T)
eigen_vals, eigen_vecs = np.linalg.eig(cov_mat)
eigen_vals = np.asarray(sorted(eigen_vals,reverse=True))
tot = sum(eigen_vals)
var_exp = [(i / tot) for i in eigen_vals]
cum_var_exp = np.cumsum(var_exp)
num= 3200
plt.figure(figsize=(10,5))
plt.subplot(121)
plt.title('PCA')
plt.step(range(1,num+1),cum_var_exp[:num], where='mid',color='r')
plt.ylabel('share of variance')
plt.xlabel('principal components')
plt.legend()
plt.grid()
plt.subplot(122)
plt.title('SVD')
plt.step(range(1,num+1),cum_var_exp_S[:num], where='mid',color='r')
plt.ylabel('share of variance')
plt.xlabel('principal components')
plt.legend()
plt.grid()