Linear Discriminant Analysis inverse transform

Question

I try to use Linear Discriminant Analysis from scikit-learn library, in order to perform dimensionality reduction on my data which has more than 200 features. But I could not find the inverse_transform function in the LDA class.

I just wanted to ask, how can I reconstruct the original data from a point in LDA domain?

Edit base on @bogatron and @kazemakase answer:

I think the term "original data" was wrong and instead I should use "original coordinate" or "original space". I know without all PCAs we can't reconstruct the original data, but when we build the shape space we project the data down to lower dimension with help of PCA. The PCA try to explain the data with only 2 or 3 components which could capture the most of the variance of the data and if we reconstruct the data base on them it should show us the parts of the shape that causes this separation.

I checked the source code of the scikit-learn LDA again and I noticed that the eigenvectors are store in scalings_ variable. when we use the svd solver, it's not possible to inverse the eigenvectors (scalings_) matrix, but when I tried the pseudo-inverse of the matrix, I could reconstruct the shape.

Here, there are two images which are reconstructed from [ 4.28, 0.52] and [0, 0] points respectively:

I think that would be great if someone explain the mathematical limitation of the LDA inverse transform in depth.

bogatron · Answer 1 · 2017-03-22T21:00:27.357

There is no inverse transform because in general, you can not return from the lower dimensional feature space to your original coordinate space.

Think of it like looking at your 2-dimensional shadow projected on a wall. You can't get back to your 3-dimensional geometry from a single shadow because information is lost during the projection.

To address your comment regarding PCA, consider a data set of 10 random 3-dimensional vectors:

In [1]: import numpy as np

In [2]: from sklearn.decomposition import PCA

In [3]: X = np.random.rand(30).reshape(10, 3)

Now, what happens if we apply the Principal Components Transformation (PCT) and apply dimensionality reduction by keeping only the top 2 (out of 3) PCs, then apply the inverse transform?

In [4]: pca = PCA(n_components=2)

In [5]: pca.fit(X)
Out[5]: 
PCA(copy=True, iterated_power='auto', n_components=2, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

In [6]: Y = pca.transform(X)

In [7]: X.shape
Out[7]: (10, 3)

In [8]: Y.shape
Out[8]: (10, 2)

In [9]: XX = pca.inverse_transform(Y)

In [10]: X[0]
Out[10]: array([ 0.95780971,  0.23739785,  0.06678655])

In [11]: XX[0]
Out[11]: array([ 0.87931369,  0.34958407, -0.01145125])

Obviously, the inverse transform did not reconstruct the original data. The reason is that by dropping the lowest PC, we lost information. Next, let's see what happens if we retain all PCs (i.e., we do not apply any dimensionality reduction):

In [12]: pca2 = PCA(n_components=3)

In [13]: pca2.fit(X)
Out[13]: 
PCA(copy=True, iterated_power='auto', n_components=3, random_state=None,
  svd_solver='auto', tol=0.0, whiten=False)

In [14]: Y = pca2.transform(X)

In [15]: XX = pca2.inverse_transform(Y)

In [16]: X[0]
Out[16]: array([ 0.95780971,  0.23739785,  0.06678655])

In [17]: XX[0]
Out[17]: array([ 0.95780971,  0.23739785,  0.06678655])

In this case, we were able to reconstruct the original data because we didn't throw away any information (since we retained all the PCs).

The situation with LDA is even worse because the maximum number of components that can be retained is not 200 (the number of features for your input data); rather, the maximum number of components you can retain is n_classes - 1. So if, for example, you were doing a binary classification problem (2 classes), the LDA transform would be going from 200 input dimensions down to just a single dimension.

No it's not true, in general it's possible to do that! For example you can reconstruct your data after performing PCA [PCA inverse_transform](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA.inverse_transform). — Babak Hashemi, Mar 22 '17 at 19:48
That's only true in general if you do not perform dimensionality reduction. In your question, you stated "to perform dimensionality reduction". The Principal Components transformation is simply a shift (if mean subtraction is required) and rotation of coordinates so you can apply the reverse rotation and shift to get back your original data. But if you reduce the number of components, you will lose information (except for the special case where the dropped components have zero variance) so you will not get back your original data. — bogatron, Mar 22 '17 at 20:17
I don't know if you are familiar with shape spaces. Basically you choose a point in the 2D or 3D and reconstruct the shape in the original space of the shape, it means with using only 2 or 3 principal compounets (which has the most variants of the data) you can guess the shape and it doesn't need to know the rest of the components. Maybe I should use "original space" instead of "original data". — Babak Hashemi, Mar 22 '17 at 21:02
No, I'm not familiar with shape spaces but I think you're at the heart of the matter with the distinction between "original space" and "original data". For PCA with dimensionality reduction, you can go back to the original space because it's simply a matter of zeroing coefficients associated with dropped eigen-values/vectors in the reverse transform. For LDA, you don't have a 200-dimensional transformed space in which you can zero out coefficients to transform back to the original space. — bogatron, Mar 22 '17 at 21:20

MB-F · Accepted Answer · 2017-03-23T15:35:27.983

The inverse of the LDA does not necessarily make sense beause it loses a lot of information.

For comparison, consider the PCA. Here we get a coefficient matrix that is used to transform the data. We can do dimensionality reduction by stripping rows from the matrix. To get the inverse transform, we first invert the full matrix and then remove the columns corresponding to the removed rows.

The LDA does not give us a full matrix. We only get a reduced matrix that cannot be directly inverted. It is possible to take the pseudo inverse, but this is much less efficient than if we had the full matrix at our disposal.

Consider a simple example:

C = np.ones((3, 3)) + np.eye(3)  # full transform matrix
U = C[:2, :]  # dimensionality reduction matrix
V1 = np.linalg.inv(C)[:, :2]  # PCA-style reconstruction matrix
print(V1)
#array([[ 0.75, -0.25],
#       [-0.25,  0.75],
#       [-0.25, -0.25]])

V2 = np.linalg.pinv(U)  # LDA-style reconstruction matrix
print(V2)
#array([[ 0.63636364, -0.36363636],
#       [-0.36363636,  0.63636364],
#       [ 0.09090909,  0.09090909]])

If we have the full matrix we get a different inverse transform (V1) than if we simple invert the transform (V2). That is because in the second case we lost all information about the discarded components.

You have been warned. If you still want to do the inverse LDA transform, here is a function:

import matplotlib.pyplot as plt

from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

from sklearn.utils.validation import check_is_fitted
from sklearn.utils import check_array, check_X_y

import numpy as np


def inverse_transform(lda, x):
    if lda.solver == 'lsqr':
        raise NotImplementedError("(inverse) transform not implemented for 'lsqr' "
                                  "solver (use 'svd' or 'eigen').")
    check_is_fitted(lda, ['xbar_', 'scalings_'], all_or_any=any)

    inv = np.linalg.pinv(lda.scalings_)

    x = check_array(x)
    if lda.solver == 'svd':
        x_back = np.dot(x, inv) + lda.xbar_
    elif lda.solver == 'eigen':
        x_back = np.dot(x, inv)

    return x_back


iris = datasets.load_iris()

X = iris.data
y = iris.target
target_names = iris.target_names

lda = LinearDiscriminantAnalysis()
Z = lda.fit(X, y).transform(X)

Xr = inverse_transform(lda, Z)

# plot first two dimensions of original and reconstructed data
plt.plot(X[:, 0], X[:, 1], '.', label='original')
plt.plot(Xr[:, 0], Xr[:, 1], '.', label='reconstructed')
plt.legend()

You see, the result of the inverse transform does not have much to do with the original data (well, it's possible to guess the direction of the projection). A considerable part of the variation is gone for good.

Linear Discriminant Analysis inverse transform

2 Answers2