Sklearn PCA, how to restore mean in lower dimension?

Question

This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA.

I'm doing a simple principal component analysis with sklearn. As I understand it, the implementation should take care of (1) centering the data when creating components and (2) de-centering the data after transformation. However, after transforming the data it is still centered. How can I project the data to a lower dimensional space while preserving the characteristics of the original data? Given that I would do dimensionality reduction on high dimensional data, I wouldn't have the appropriate mean for each principal component, how can that be derived?

Reducing 3 dimensions to 2 dimensions:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

X = np.array([[-1, -1, -1], [-2, -1, -1], [-3, -2, -3], [1, 1, 1], [2, 1, 2], [3, 2, 3]]) + 3
X.shape

(6, 3)

fig = plt.figure(figsize=(10, 8), dpi= 80, facecolor='w', edgecolor='k')
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0], X[:,1],X[:,2], '*')
plt.title('original')
plt.show()

PCA with 2 components:

pca = PCA(n_components=2)
pca.fit(X)
X_trans =pca.transform(X)
X_trans.shape

(6, 2)

plt.plot(X_trans[:,0], X_trans[:,1], '*')
plt.show()

What I would like to do at this stage is to "restore" my data in this lower dimension, such that the value of the data points correspond to the original data. It should still only have 2 dimensions, but not be centered around the mean.

Performing inverse transform, as suggested below, actually brings me back to 3 dimensions

X_approx = pca.inverse_transform(X_trans) 
X_approx.shape

(6, 3)

I want to remain in 2 dimensions but still have my data as resemble it's original form as closely as possible and not be centered around the mean.

I agree with the answer, but I think your actual goal is not to "restore" data but to somehow project the means into the lower dimensional space. Is that correct? — MB-F, Mar 20 '19 at 14:55
@kazemakase: That's what is being done here I guess. The new restored data is the same data **but** projected in a 2-dimensional space because `n_components=2` during fit. — Sheldore, Mar 20 '19 at 14:58
@Mountain_sheep: I see the problem now. Read my comment and my edited answer — Sheldore, Mar 20 '19 at 15:30

Sheldore · Answer 1 · 2019-03-20T15:28:45.427

3

You are just fitting the data and plotting the transformed data. To get the original data back in a lower dimension, you need to use inverse_transform which gives you the original data back as I show below in the plot. From the docs:

inverse_transform(X)

Transform data back to its original space.

pca = PCA(n_components=2)
pca.fit(X)

X_trans =pca.transform(X)
X_original = pca.inverse_transform(X_trans)
plt.plot(X_original[:,0], X_original[:,1], 'r*')

edited Mar 20 '19 at 15:28

answered Mar 20 '19 at 14:52

Sheldore

37,862
7
57
71

Thanks, but the data is still centered around the y-axis...? – Mountain_sheep Mar 20 '19 at 15:05
@Mountain_sheep: The original data is just a relationship between the one independent variable with one dependent variable – Sheldore Mar 20 '19 at 15:23
@Mountain_sheep: I see the problem now. You were just fitting using `pca.fit(X)` but then you have to apply the `pca` to the original data to transform it to `X_trans` using `transform`. Once you have transformed, you can transform back to original data using `inverse_transform` – Sheldore Mar 20 '19 at 15:29
Like I describe in the new edit this method gets me back to my original dimension, I want to stay in lower dimension but de-center my data. – Mountain_sheep Mar 20 '19 at 18:47

Sklearn PCA, how to restore mean in lower dimension?

1 Answers1