I need to compute singular value decomposition (SVD) of an array of matrices stored in a RcppArmadillo cube. I would like to do the SVDs in parallel for each matrix and want to enable multi-threading in calculating the SVDs via OpenBLAS. For that I use the following Cpp script:
field<vec> sing_vals(S);
#pragma omp parallel for
for(unsigned s=0;s<S;++s){
svd( sing_vals(s), X.slice(s) );
}
Now before compiling this Cpp file from R via sourceCpp, I set the number of OpenMP and OpenBLAS threads with the following script:
library(OpenMPController )
omp_set_num_threads(nomp)
Sys.setenv("PKG_CXXFLAGS"="-fopenmp")
Sys.setenv("PKG_LIBS"="-fopenmp")
#########################set OPENBLAS nthreads
require(inline)
openblas.set.num.threads <- cfunction( signature(ipt="integer"),
body = 'openblas_set_num_threads(*ipt);',
otherdefs = c ('extern void openblas_set_num_threads(int);'),
libargs = c ('-L/opt/openblas/lib -lopenblas'),
language = "C",
convention = ".C"
)
openblas.set.num.threads(nblas)
I have to set either of nblas and nomp to 1. Else I get the message
OpenBLAS Warning : Detect OpenMP Loop and this application may hang. Please rebuild the library with USE_OPENMP=1 option
How can I do this nested parallelization in this exact problem? I read some materials on nested parallelization but could not figure out how to do it in my case.