Why linear transformation improves accuracy and efficiency of classification for high-dimensional data?

Question

Let X be an m×n (m: number of records, and n: number of attributes) dataset. When the number of attributes n is large and the dataset X is noisy, classification gets more complicated and the classification accuracy decreases. One way to over come this problem is to use linear transformation, i.e., perform classification on Y=XR, where R is an n×p matrix, and p<=n. I was wondering how linear transformation simplifies classification? and why classification accuracy increases if we do classification on the transformed data Y when X is noisy?

In high dimensional space the notion of distance [becomes meaningless](http://en.wikipedia.org/wiki/Clustering_high-dimensional_data). Many elaborate classifiers heavily rely on some distance measure, so I'm guessing that's one possible reason. Having said that, consider posting your question on [Computer Science](http://cs.stackexchange.com/), or [Cross Validated](http://stats.stackexchange.com/). Here, currently, it's off-topic. — BartoszKP, Apr 26 '14 at 19:36

score 0 · Answer 1 · answered Apr 26 '14 at 21:48

0

Not every sort of linear transformation would work, but some linear transformations are sometimes useful. Specifically, principal component analysis (PCA) and Factor Analysis are linear transformations often used for dimensionality reduction.

The basic idea is that most of the information is probably contained in some linear combination of the features of the dataset, and that by throwing the rest of them away, we are forcing ourselves to use simpler models / overfit less.

This isn't always so great. For example, even if one of the features is actually the thing we're trying to classify, it could still be discarded by PCA is it has low variability - thus losing important information.

answered Apr 26 '14 at 21:48

Guy Adini

5,188
5
32
34

I believe that this is related to the Johnson-Lindenstrauss lemma: http://en.wikipedia.org/wiki/Johnson%E2%80%93Lindenstrauss_lemma. A random embedding of a small set of high-dimensional data points into low dimensional space will preserve distances in a well behaved manner. – Guy Adini Apr 30 '14 at 19:03
After you preserve distances, better separability most likely stems from not overfitting. – Guy Adini Apr 30 '14 at 19:05

Why linear transformation improves accuracy and efficiency of classification for high-dimensional data?

1 Answers1