I'm given a 20122-dim matrix, each column is a 20-element vector. Now I apply umap to the matrix and get a 2122 matrix and a umap plot. How to measure the goodness of fit of this umap model? Is there any standard way to do that?
Asked
Active
Viewed 257 times
1 Answers
0
UMAP has multiple purposes like clustering, supervised learning and outlier detection.
What exactly do you want to do with UMAP?
In case of clustering, you can take a look at sklearn cluster evaluation and compare the scores with other algorithms like t-SNE.
To look for the structure, you can reduce your data to 2-3 dimensions and use a scatter plot to eye check the results.
When you have labeled data, you can try to classify them with (nonlinear) classifiers like a random forest and compare the result score (e.g. accuracy) with other dimension reduction techniques like PCA.
Maybe you look for the trustworthiness from sklearn. You can compare the scores of PCA with the score of UMAP or any other dimension reduction algorithm. source

Wuuzzaa
- 56
- 6
-
Sorry for the very late reply. I'm using umap to discover the structure of a 20-dim dataset. Cluster doesn't matter. – Josie G Oct 12 '22 at 05:17
-
I got scatter plots for both umap and pca. But how to compare them? I don't have labels, I try to use both ways as unsupervised. I tried one method named shepard diagram, but it mainly focuses on MDS. I'm not sure it this also works for umap and pca. – Josie G Oct 16 '22 at 21:16
-
Thanks for your help. I compared UMAP and PCA and found that PCA performs better than UMAP. Is this expected? As I think UMAP is more advance than PCA. – Josie G Oct 23 '22 at 03:44
-
It depends on the data. UMAP is much more advanced, like you said. You can play around with the hyperparameter of UMAP to improve the results. – Wuuzzaa Nov 07 '22 at 16:32