The main parameters I am using to create the umap are min_dist
, a
and b
. I have set min_dist=0.5
, a=1
, b=1
which is giving a meaningful low-dimensional representation for most datasets initially, when all the features are used (approx 10K to 30K features). But when I am reducing the number of features of the data via a feature selection method (200-500 features are selected), then the low-dimensional umap representation doesn't show any meaning anymore (e.g - It becomes very sparse and stringy).I have to then keep tuning the parameters so that the 2D visualization makes sense.
Is there any way to overcome the necessity of manual tuning and generalize the parameter values according to the number of features selected?
P.S - I am not a maths student and have a very vague 'understanding' of how umap works. I have not implemented the algorithm myself. I am using seurat package's RunUMAP function on single-cell data.