3

This question concerns how to de-center and "restore" the data in a lower dimension after performing PCA.

I'm doing a simple principal component analysis with sklearn. As I understand it, the implementation should take care of (1) centering the data when creating components and (2) de-centering the data after transformation. However, after transforming the data it is still centered. How can I project the data to a lower dimensional space while preserving the characteristics of the original data? Given that I would do dimensionality reduction on high dimensional data, I wouldn't have the appropriate mean for each principal component, how can that be derived?

Reducing 3 dimensions to 2 dimensions:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

X = np.array([[-1, -1, -1], [-2, -1, -1], [-3, -2, -3], [1, 1, 1], [2, 1, 2], [3, 2, 3]]) + 3
X.shape

(6, 3)

fig = plt.figure(figsize=(10, 8), dpi= 80, facecolor='w', edgecolor='k')
ax = fig.add_subplot(111, projection='3d')
ax.scatter(X[:,0], X[:,1],X[:,2], '*')
plt.title('original')
plt.show()

enter image description here

PCA with 2 components:

pca = PCA(n_components=2)
pca.fit(X)
X_trans =pca.transform(X)
X_trans.shape

(6, 2)

plt.plot(X_trans[:,0], X_trans[:,1], '*')
plt.show()

enter image description here

What I would like to do at this stage is to "restore" my data in this lower dimension, such that the value of the data points correspond to the original data. It should still only have 2 dimensions, but not be centered around the mean.

Performing inverse transform, as suggested below, actually brings me back to 3 dimensions

X_approx = pca.inverse_transform(X_trans) 
X_approx.shape

(6, 3)

I want to remain in 2 dimensions but still have my data as resemble it's original form as closely as possible and not be centered around the mean.

Mountain_sheep
  • 311
  • 2
  • 16
  • I agree with the answer, but I think your actual goal is not to "restore" data but to somehow project the means into the lower dimensional space. Is that correct? – MB-F Mar 20 '19 at 14:55
  • @kazemakase: That's what is being done here I guess. The new restored data is the same data **but** projected in a 2-dimensional space because `n_components=2` during fit. – Sheldore Mar 20 '19 at 14:58
  • changed my wording to "de-centering" – Mountain_sheep Mar 20 '19 at 15:24
  • @Mountain_sheep: I see the problem now. Read my comment and my edited answer – Sheldore Mar 20 '19 at 15:30

1 Answers1

3

You are just fitting the data and plotting the transformed data. To get the original data back in a lower dimension, you need to use inverse_transform which gives you the original data back as I show below in the plot. From the docs:

inverse_transform(X)

Transform data back to its original space.

pca = PCA(n_components=2)
pca.fit(X)

X_trans =pca.transform(X)
X_original = pca.inverse_transform(X_trans)
plt.plot(X_original[:,0], X_original[:,1], 'r*')

enter image description here

Sheldore
  • 37,862
  • 7
  • 57
  • 71
  • Thanks, but the data is still centered around the y-axis...? – Mountain_sheep Mar 20 '19 at 15:05
  • @Mountain_sheep: The original data is just a relationship between the one independent variable with one dependent variable – Sheldore Mar 20 '19 at 15:23
  • @Mountain_sheep: I see the problem now. You were just fitting using `pca.fit(X)` but then you have to apply the `pca` to the original data to transform it to `X_trans` using `transform`. Once you have transformed, you can transform back to original data using `inverse_transform` – Sheldore Mar 20 '19 at 15:29
  • Like I describe in the new edit this method gets me back to my original dimension, I want to stay in lower dimension but de-center my data. – Mountain_sheep Mar 20 '19 at 18:47