4

I have a dataset of NxM data in binary form. I apply a variety of dimensionality techniques on it and I plot the first two dimensions. This is how I get an intuition of whether the technique is suitable for my dataset or not. Is there a more appropriate/methodical/heuristic/formal way to test the suitability of the dimensionality reduction techniques I use?

JustCurious
  • 1,848
  • 3
  • 30
  • 57

2 Answers2

1

The main purpose of applying dimensionality reduction on data is to capture the original data's distribution as much as possible even after the dimensionality reduction. Therefore, we want to make sure that we capture the variance of the data as much as we can.

Let's say you have a N*N matrix, and we perform SVD (Singular Value Decomposition) on X. Then, we'll observe the singular values, the diagonal entries in the resulting S matrix. (X = USV)

And you want to cut them off at some index K based on the desired percentage variance captured:

i=1 K sigma(i) / ∑ i=1 N sigma(i)

If you select the first K columns of U, then you are reducing your original N-dimension to K-dimension.

aerin
  • 20,607
  • 28
  • 102
  • 140
0

You could use SOM technique to be able to see several dims in two dimensions. There are other techniques also, I will update the answer if I can remember their name, but I am used to SOM.

You can find one good SOM toolbox for matlab clicking here.

This helps you to visualize, but the evaluation should use an efficiency meter that measure what is important for your dimension reduction (the SOM itself may be used as a dimension reduction technique). What is important, to compress data with minimal loss? To compress data as most as possible? To represent data in a visible way? You can probably measure the techniques efficiency without needing to see how did them changed the data space representation, all you need is a good function to measure how good your technique is.

Werner
  • 2,537
  • 1
  • 26
  • 38