I run the following sample code to simulate values and below is snapshot of usage of 4 cores. It takes a while to use all cores at full capacity, I'd like to understand what's going on and ultimately how to make it faster.
library(doParallel)
library(data.table)
data<-data.table(a=runif(10000000),b=runif(10000000),quantile=runif(10000000))
e <- nrow(data)%/%1000000+1
dataSplit<-split(data[],seq_len(nrow(data))%/%1000000)
qbetaVec<-function(lossvalues) qbeta(lossvalues$quantile,lossvalues$a,lossvalues$b)
cl <- makeCluster(4)
registerDoParallel(cl)
res2<-foreach(i=1:e) %dopar% qbetaVec(dataSplit[[i]])
res3<-unlist(res2)
It takes about 67 secs to complete on my machine. I had a look at the performance monitor while res2 was running and it looks like it takes a while to use all 4 cores at full capacity. I'd like to understand what is the reason for this. Is it unavoidable? What is going on before all cores are utilized at full capacity? Would it be faster to try this with RcppParallel?