0

I am comparing eig and eigh and I would like clarification of why the results are partly inverted!

f1 and f2 are my 2 features.

pc1 and pc2 are the principal components using eig.

pc1h and pc2h are using eigh.

As can be seen, pc2 and pc2h appear very similar.

But pc1 appears inverted to pc1h. Also pc1h is similar to f2 in shape, which I would expect.

Why is pc1 upside-down?

enter image description here enter image description here

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

#Generate a dummy dataset.
np.random.seed(300)
X = np.random.randint(10,50,100).reshape(50,2)
X[:,1] =X[:,1]*10
def PCA(X, num_components):
    # Step-1
    X_meaned = X - np.mean(X, axis=0)

    # Step-2
    cov_mat = np.cov(X_meaned, rowvar=False)

    # Step-3
    eigen_values, eigen_vectors = np.linalg.eig(cov_mat)

    # Step-4
    sorted_index = np.argsort(eigen_values)[::-1]
    sorted_eigenvalue = eigen_values[sorted_index]
    sorted_eigenvectors = eigen_vectors[:, sorted_index]

    # Step-5
    eigenvector_subset = sorted_eigenvectors[:, 0:num_components]

    # Step-6
    X_reduced = np.dot(eigenvector_subset.transpose(), X_meaned.transpose()).transpose()

    return X_reduced

def PCAh(X, num_components):
    # Step-1
    X_meaned = X - np.mean(X, axis=0)

    # Step-2
    cov_mat = np.cov(X_meaned, rowvar=False)

    # Step-3
    eigen_values, eigen_vectors = np.linalg.eigh(cov_mat)

    # Step-4
    sorted_index = np.argsort(eigen_values)[::-1]
    sorted_eigenvalue = eigen_values[sorted_index]
    sorted_eigenvectors = eigen_vectors[:, sorted_index]

    # Step-5
    eigenvector_subset = sorted_eigenvectors[:, 0:num_components]

    # Step-6
    X_reduced = np.dot(eigenvector_subset.transpose(), X_meaned.transpose()).transpose()

    return X_reduced

X_reduced = PCA(X,2)
X_reducedh = PCAh(X,2)

df_y = pd.DataFrame()
df_y['f1'] = X[:,0]
df_y['f2'] = X[:,1]
df_y['pc1'] = X_reduced[:,0]
df_y['pc2'] = X_reduced[:,1]
df_y['pc1h'] = X_reducedh[:,0]
df_y['pc2h'] = X_reducedh[:,1]
df_y.plot()
plt.show(block=True)
ManInMoon
  • 6,795
  • 15
  • 70
  • 133

1 Answers1

0

I don't see any other answer to this "why?" than "why not"

Understand that there is always an infinite number of eigenvectors for a given eigenvalues. So the choice of which eigenvector is arbitrary.

For matrix M

[[-1/2,  3/2],
 [ 3/2, -1/2]]

Eigenvalues are λ₁=-2 and λ₂=-2. And example of eigenvectors are u₁=[1,-1] and u₂=[1,1].

You can check that M@u₂ = M@[1,1] = [-1/2+3/2, 3/2-1/2] = [1,1] = λ₂u₂.
And M@u₁ = M@[1,-1] = [-1/2-3/2, 3/2+1/2 ] = [-2, 2] = -2[1,-1] = λ₁u₁.

So, sure, since you have 2 different eigenvalues in a 2D space, you know that each eigenspace is dimension 1. So, all eigenvectors associated to the same eigenvalue are colinear.

But still, that let room for an infinite number of them. For example, v₁=[-2,2] and v₂=[-3,-3] are also eigenvectors. M@[-3,-3] = [3/2-9/2, -9/2+3/2] = λ₂[-3,-3]
and M@[-2,2] = [1+3, -3-1] = [4,-4] = λ₁[-2,2]

So if you have a data X=[1,2] in the canonical space, the representation of that X in the system with basis u₁,u₂ = [1,-1], [1,1] is

X_reduced_v1 = [-0.5, 1.5]
# meaning that X is -0.5[1,-1]+1.5[1,1], which is indeed [1,2]

If you represent the same data X=[1,2] in another system with basis v₁,v₂=[-2,2],[-3,-3], then you will have

X_reduced_v2 = [1/4, -1/2]
# Meaning that X is 1/4[-2,2]-1/2[-3,-3], which is indeed [1, 2]

So, both reduction are correct. And both are done in a basis made of eigenvectors of my matrix. So both represent equally well X in terms of eigenvectors.

Now, np.eig and np.eigh both returns normalized vector. So since in my example eigenspace are dimension 1, that reduce a lot the infinite number of possibilities. Which is convenient in your case, because you are obviously doing this for PCA. But even then, there is surely no longer an infinite number of possible eigenvectors, but there are still 2 possibility for each eigenvectors.

In my example eigenvectors with norm 1 could be U₁=[1/√2, -1/√2] for the one associated with λ₁=-2, and U₂=[1/√2, 1/√2] for the one associated with λ₂=1. And in such system, my vector [1,2] would have coordinates

X_reduced_v3 = [-0.70710678,  2.12132034])
# You can check that indeed, -0.70710678*U₁+2.21132034*U₂ = [1,2]

But V₁=[-1/√2, 1/√2] and V₂=[-1/√2, -1/√2] are also eigenvectors of norm 1 associated with eigenvalues λ₁=-2 and λ₂=1.

In system V₁,V₂, X coordinates become

X_reduced_v4 = [ 0.70710678, -2.12132034]

There is absolutely no criterion left to choose between those 2 systems (nor the 2 systems I haven't mentioned, made of vectors [1/√2, -1/√2], [-1/√2, -1/√2], and vectors [-1/√2, -1/√2], [1/√2, 1/√2]. So 4 different possible basis). Those 4 basis are made of eigenvectors of my matrix. Those 4 basis are made of vector of norm 1. Those 4 basis are sorted by their eigenvalues. There is no other criterion, not even an arbitrary one, to choose among those 4.

So, it really doesn't matter which one you, or the diagonalization algorithm you use, choose. The only thing that matters is that you are consistent. If you choose basis U₁,U₂, then you must say that X_reduced is [-0.70710678, 2.12132034] (that is what you call [pca₁, pca₂]).

If you choose basis V₁, V₂, then you must say that X_reduced is [0.70710678, -2.12132034]

So that, what ever basis (eigenvectors) W₁,W₂ you choose, X=pca₁W₁+pca₂W₂ remains true. There are multiple, equally valid, W₁,W₂,pca₁,pca₂ that do that. Which one is chosen is totally arbitray.

In your case eig happens to choose one, while eigh choose another. That's all.

chrslg
  • 9,023
  • 5
  • 17
  • 31
  • Hi, thank you for this very full answer. However, I am still left with do I use pc1 and pc2 for my anaylsis or pc1h and pc2h? You say both are 'correct' but pc1 looks inverted to f1 - so if I use this my anaysis will be incorrect, unless I manually flip it over. It would appear that pc1 is negatively correlated to f1. How can I resolve this problem? MY pc1 and pc1h have already been multiplied by W I think (X_meaned)? Or am I missing a final step that might re-invert pc1? – ManInMoon Jun 05 '23 at 22:14
  • I would be grateful if you could show me what (from my code example) I need to multiply X_reduced or (pc1,pc2) by as I thought that was already done in Step-6 – ManInMoon Jun 05 '23 at 22:24
  • I would understand what you say if pc1+pc2 == pc1h+pc2h. BUT this does not hold. So I must have done something wrong in one of my functions? – ManInMoon Jun 05 '23 at 22:50
  • May I ask what kind of analysis are you doing? I mean, you didn't say what you want to use pc1 and pc2 for. I fail to see for which kind of analysis it could matter whether pc1 is inverted to f1 or not (again, it is just an axis. Same happen with regular axis: depending on the unit you choose values change, and still it describe the same thing). – chrslg Jun 06 '23 at 01:33
  • For example, if your plan is (that is quite a classical usage of PCA) to reduce number of dimension, to feed a ML algorithm with pc1 and pc2 rather than with raw features f1/f2 (assuming your example was just an example and that in real life you've way more f? features that pc? main components), then it doesn't matter if pc1 is inverted or not. The ML algorithm would work both way. – chrslg Jun 06 '23 at 01:35
  • Or, another usage of PCA, would be to then draw a scatter plot of your `x_reduced` points to do some gating (for example, draw healthy in green and ill patients in red in scatter plot of their pca1/pca2 coordinates, to expect to see a pattern). In such case, it doesn't matter whether you invert or not those coordinates (all it would change is orientation of your scatter plot, not the fact that you can or cannot separate colors. And orientation is supposed to be meaningless) – chrslg Jun 06 '23 at 01:38
  • You are not supposed to take any information from the raw values of pca1/pca2: those coordinates mean something only in a coordinate system. It is only the relative distribution of those values that is supposed to mean something. And this is invariant to any inversion of axis. – chrslg Jun 06 '23 at 01:39
  • @ManInMoon So, again, it would help understand your concern if you explained why you are doing a PCA in the first place, and what kind of analysis/processing you intend to do with pca1/pca2 values. – chrslg Jun 06 '23 at 01:42
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/253965/discussion-between-maninmoon-and-chrslg). – ManInMoon Jun 06 '23 at 08:57