I'm using locally linear embedding (LLE) method in Scikit-learn for dimensionality reduction. The only example that I could find belong to the Scikit-learn documentation here and here, but I'm not sure how should I choose the parameters of the method. In particular, is there any relation between the dimension of data points or the number of samples and the number of neighbors (n_neighbors
) and number of components (n_components
)? All of the examples in Scikit-learn use n_components=2, is this always the case? Finally, is there any other parameter that is critical to tune, or I should use the default setting for the rest of parameters?

- 565
- 1
- 10
- 27
-
A key question is: what are you using LLE for? You see `n_components=2` when it's being used to plot higher-dimensional data in 2D. The `n_neighbors` determines how smooth things are: when you consider many neighbors, you will smooth boundaries between things -- perhaps over-smoothing. Your use case is they key: what's your goal. Scikit documentation is, unfortunately, poor. – Wayne Apr 07 '19 at 13:33
2 Answers
Is there any relation between the dimension of data points or the number of samples and the number of neighbors (
n_neighbors
) and number of components (n_components
)?
Generally speaking, not related. n_neighbors
is often decided by the distances among samples. Especially, if you know the classes of your samples, you'd better set n_neighbors
a little bit greater than the number of samples in each class. While n_components
, namely the reduced dimension size, is determined by the redundancy of data in original dimension. Based on the specific data distribution and your own demands, you can choose the proper space dimension for projection.
n_components=2
is to mapping the original high-dimensional space into a 2d-space. It is a special case, actually.
Is there any other parameter that is critical to tune, or I should use the default setting for the rest of parameters?
Here are several other parameters you should take care of.
reg
for weight regularization, which is not used in the original LLE paper. If you don't want to use it, just simply set it to zero. However, the default value ofreg
is1e-3
, which is quite small.eigen_solver
. If your data size is small, it is recommended to usedense
for accuracy. You can do more research on this.max_iter
. The default value ofmax_iter
is only 100, which often causes the results not converged. If the results are not stable, please choose a larger interger.

- 2,195
- 3
- 14
- 24
You can use GridSearch (Scikit-learn) to choose the best values for you.

- 729
- 3
- 11
- 23
-
Thank you for your answer. I am actually interested in any relationship that exists between such parameters, rather than looking for the best choice by applying GridSeach, which can be applied for other methods to find the values that give the best result for a particular problem. – Miranda Apr 04 '17 at 13:29