pca.inverse_transform in sklearn

Question

after fitting my data into X = my data

pca = PCA(n_components=1)
pca.fit(X)
X_pca = pca.fit_transform(X)

now X_pca has one dimension.

When I perform inverse transformation by definition isn't it supposed to return to original data, that is X, 2-D array?

when I do

X_ori = pca.inverse_transform(X_pca)

I get same dimension however different numbers.

Also if I plot both X and X_ori they are different.

butterflyknife · Accepted Answer · 2019-04-05T14:44:03.337

When I perform inverse transformation by definition isn't it supposed to return to original data

No, you can only expect this if the number of components you specify is the same as the dimensionality of the input data. For any n_components less than this, you will get different numbers than the original dataset after applying the inverse PCA transformation: the following diagrams give an illustration in two dimensions.

score 6 · Answer 2 · answered Apr 05 '19 at 10:28

It can not do that, since by reducing the dimensions with PCA, you've lost information (check pca.explained_variance_ratio_ for the % of information you still have). However, it tries its best to go back to the original space as well as it can, see the picture below

(generated with

import numpy as np
from sklearn.decomposition import PCA
pca = PCA(1)
X_orig = np.random.rand(10, 2)
X_re_orig = pca.inverse_transform(pca.fit_transform(X_orig))

plt.scatter(X_orig[:, 0], X_orig[:, 1], label='Original points')
plt.scatter(X_re_orig[:, 0], X_re_orig[:, 1], label='InverseTransform')
[plt.plot([X_orig[i, 0], X_re_orig[i, 0]], [X_orig[i, 1], X_re_orig[i, 1]]) for i in range(10)]
plt.legend()
plt.show()

) If you had kept the n_dimensions the same (set pca = PCA(2), you do recover the original points (the new points are on top of the original ones):

once information has been lost how does it try to go back to 2-D? Also then why do we even use inverse_transform? — haneulkim, Apr 05 '19 at 10:34

pca.inverse_transform in sklearn

2 Answers2

Linked