1

newbie need help again.I'm playing around a dataset with UMAP, a dimension reduction tool. Things like this will have 2 parameters that need to tune and look. Previously I have used tSNE, and it requires one parameter tuning. For tSNE, the parameter is called perplexity. To trial a few values for perplexity and visualise the result, I think the map function in purrr works great to automate this.

#for this purpose the sample data can be anything
#only that my dataset has lots labels
df <- data.frame(replicate(110,sample(-10:10,1000,rep=TRUE)))
df.label <- df[,1:20]
df.data <- df[,21:110]

library(tsne)
library(purrr)
#set the test values for perplexity a vector
#map along a vector

perplex=c(10,20,50,100)
map(perplex,tsne(df.data,perplexity = perplex))

The result of tense() will generate a x/y coordinate for each row(sample) then I can plot them. Although, a little help here to teach me how to automatically map out all 4 test results will be awesome, otherwise I have to use plot 4 times, each with x=tsne[,1] and y=tsne[,2].

Now, for the umap that I want to test. I want to test 2 parameters, n_neighbors and min_dist the same way. And the complexity is for each value I pick for n_neighbors, I want to test all min_dist test values. For example if : n_neighbors= 10,50,20 min_dist= 0.1, 0.5, 1, 10 I want to run the umap function on my data for n_neighbors=10, and iterate min_dist= 0.1, 0.5, 1, 10. And repeat this for the rest of n_neighbors value.

Then I'm stuck with the map function in purrr. I think I can only pass 1 vector in the function.

#map along a vector
n_neighbors.test= c(10,50,20)
min_dist.test= c(0.1, 0.5, 1, 10)

map(?,umap(df.data,n_neighbors = n_neighbors.test, min_dist=min_dist.test ))

and then also the plotting issue. UMAP also gives a list, one matrix is the layout that contains x/y coordinates of the rows.

ML33M
  • 341
  • 2
  • 19

1 Answers1

1

Try :

expand.grid(n_neighbors.test,n_neighbors) %>% transpose() %>% map(~{umap(df.data,n_neighbors = .x[[1]], min_dist=.x[[2]] )})

Alternatively, you can use imbricated maps:

unlist(map(n_neighbors.test,function(x){
  map(min_dist.test,function(y){umap(df.data,x,y)})
}))
Waldi
  • 39,242
  • 6
  • 30
  • 78
  • thank you for the code. This expand.grid function looks very promising. Let me try. Unfortunately the data is killing my computer. Let me subset some out to test this :) – ML33M Jun 18 '20 at 18:11
  • cartesian products are dangerous with large datasets! – Waldi Jun 18 '20 at 18:16
  • see my edit for an alternative which is less memory intensive – Waldi Jun 18 '20 at 18:23
  • this would be sweet – ML33M Jun 18 '20 at 18:27
  • Error during wrapup: 'arg' must be NULL or a character vector. – ML33M Jun 18 '20 at 18:46
  • this error pop up when I use the new code unlistXXXXXX. The first one gave me an error on min_dist, so I must screwed up the testing number. but the error on the unlist function I have no clue. the idea of the code inside unlist is a nested maping right? – ML33M Jun 18 '20 at 18:47
  • sorry, I made an error on the first solution : forgot to transpose... see edit. For the second, should work. You can remove the unlist to see where the erro comes from – Waldi Jun 18 '20 at 19:05
  • Hi @Waldi, don't be sorry. I'm happy and you are helping me. for the first solution, what is loops and loopb? for the second memory insensitive solution, the same error pop up even if I take off the unlist – ML33M Jun 18 '20 at 20:31
  • it was a test on my side, I forgot to put back n_neighbors.test and min_dist.test. COrrected : see edit – Waldi Jun 18 '20 at 20:33
  • ahh, haha now it makes sense. let me try – ML33M Jun 18 '20 at 20:38
  • actually you mean : expand.grid(n_neighbors.test,min_dist.test) %>% transpose() %>% map(~{umap(df.data,n_neighbors = .x[[1]], min_dist=.x[[2]] )}) – ML33M Jun 18 '20 at 20:40
  • all good. Solution one finished run. But how do I visualise the plot? in a automated way. Coz if I test 10 x 10 parameters then I will have to manually plot (x= y=) and use par() to organised the map. – ML33M Jun 18 '20 at 20:47
  • you can save the results in a list, say plots <- expand.grid... and then : pdf("plots.pdf") for (i in 1:length(plots)) { print(plots[[i]]) } dev.off() – Waldi Jun 18 '20 at 21:10
  • hi @Waldi, sorry I've been tinkering around and modifying the pieces. I went back to set up expan grid and use that for a grid search on the parameters for the models. I came up with a big piece of code, but I got stuck in the end of how to visualize them. How could I share them to you for discussion? – ML33M Jun 22 '20 at 19:14