0

I am doing PCA using temperature data (below), samples in row and features(1000 hPa, 925 hPa... etc) in columns

array([[ 25. ,  22.2,  19. , ..., -51.9, -50.3, -41.1],
       [ 26.8,  22.8,  18.4, ..., -53.1, -49.5, -41.1],
       [ 26.4,  23.4,  19.4, ..., -56.7, -49.7, -41.3],
       ...,
       [  9.4,   6.8,   3.2, ..., -57.7, -55.9, -57.9],
       [ 12.4,   7.4,   3.8, ..., -53.5, -53.9, -56.1],
       [  9.6,   5.8,   4.2, ..., -54.9, -53.1, -50.9]])

I ran PCA.

pca = PCA(n_components=2)
proj = pca.fit_transform(data)
inversed_data = pca.inverse_transform(proj)

(Here, inversed data is estimated values (PC1 + PC2). right?)

I seperated estimated values into PC1 and PC2 using pca.components_.

pca.components_

array([[-0.33776309, -0.34230437, -0.33367396, -0.32389647, -0.36274215,
        -0.37980682, -0.33324365, -0.21884887, -0.02131457,  0.16129112,
         0.24344067,  0.15305721,  0.08841673,  0.0262782 ,  0.00574684,
         0.00390428],
       [-0.18303616, -0.29623333, -0.32912031, -0.17544341, -0.08903607,
         0.04295601,  0.37370419,  0.55664452,  0.40733697,  0.0431838 ,
        -0.21696205, -0.20124614, -0.14519851, -0.05066843, -0.01942078,
         0.031218  ]])

But I have trouble now. I want to compare pca.components_ with original data. To do this, I have to inverse pca.components_ but I can't. Do you have any idea?
I did :

pca.inverse_transform(pca.components_)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-a389a4196f5f> in <module>()
----> 1 pca.inverse_transform(pca.components_[0])

/usr/local/lib/python3.7/dist-packages/sklearn/decomposition/_base.py in inverse_transform(self, X)
    157                             self.components_) + self.mean_
    158         else:
--> 159             return np.dot(X, self.components_) + self.mean_

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (16,) and (2,16) not aligned: 16 (dim 0) != 2 (dim 0)

But it didn't work. Or can I use sklearn.preprocessing.StandardScaler.inverse_transform() to see inversed pca.components_? Actually it did work. but I don't know it is right or wrong.

Thank you

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Does this https://stackoverflow.com/questions/55533116/pca-inverse-transform-in-sklearn answer your question? – CutePoison Aug 11 '21 at 08:39

1 Answers1

1

When you do PCA and set n_components<n_features you will lose information, thus you cannot get the exact same data when you transform back, (see this SO answer).

You can think of it as having a picture that's 1024x1024, you then scale it down to 784x784 and then want to scale it back to 1024x1024 - that cannot be done 1:1. You can still see the image, but it might be a bit blurry

CutePoison
  • 4,679
  • 5
  • 28
  • 63
  • First, thank you for answering. Original data and inversed data are different because pca model's components is less than original data's features. I understand what you said and watching [this](https://stackoverflow.com/questions/55533116/pca-inverse-transform-in-sklearn). I want to know the way that inverse transform "pca.components_" which is coefficent values. – python_user Aug 11 '21 at 14:43
  • So you want to know how to get back to the original space or just what the coefficients tells us or..? – CutePoison Aug 12 '21 at 05:00
  • I want to know how pca.components_ values (which is 1 orders) turn to original data (which is 10 orders). But now I know it is impossible! – python_user Aug 12 '21 at 07:18