8

I am trying to run a QR decomposition (LAPACKE_dgeqrf) in R on a linux machine (CentOS) using a C++ program that is interfaced with Rcpp. Unfortunately, I see only 100% using top. This also happens on a Red Hat Enterprise Linux Server. However, the C++ program (with LAPACKE_dgeqrf) runs at nthreads * 100% when started from the terminal (independently outside of R). I compiled OpenBLAS with

NO_AFFINITY=1 

and tried

export OPENBLAS_NUM_THREADS=4
export GOTO_NUM_THREADS=4
export OMP_NUM_THREADS=4
export OPENBLAS_MAIN_FREE=1

Nothing works. Everything works fine on a Mac though. 'mcaffinity()' from the parallel R package returns NULL. I configured R using

configure  'CFLAGS=-g -O3 -Wall -pedantic' 'CXXFLAGS=-g -O3 -Wall -pedantic' 'FCFLAGS=-g -O3' 'F77FLAGS=-g -O3' '--with-system-zlib' '--enable-memory-profiling'

My C++ code:

#include <Rcpp.h>
#include <lapacke.h>
#include <cblas.h>

//[[Rcpp::export]]
Rcpp::NumericMatrix QRopenblas(Rcpp::NumericMatrix X)
{
    // Declare variables 
    int n_rows = X.nrow(), n_cols = X.ncol(), min_mn = std::min(n_rows, n_cols);
    Rcpp::NumericVector tau(min_mn);

    // Perform QR decomposition
    LAPACKE_dgeqrf(CblasColMajor, n_rows, n_cols, X.begin(), n_rows, tau.begin());

    return X;
}

My R code:

PKG_LIBS <- '/pathto/openblas/lib/libopenblas.a' 
PKG_CPPFLAGS <- '-I/pathto/openblas/include'
Sys.setenv(PKG_LIBS = PKG_LIBS , PKG_CPPFLAGS = PKG_CPPFLAGS) 
Rcpp::sourceCpp('/pathto/QRopenblas.cpp', rebuild = TRUE)

n_row <- 4000
n_col <- 4000
A <- matrix(rnorm(n_row * n_col), n_row, n_col)
res <- QRopenblas(A)
Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
chris
  • 461
  • 2
  • 10
  • Hm. I'd simplify: try a small C++ standaolone, see if this works. Rcpp does not set/reset the number of cores, and you seem to use the right env. vars. – Dirk Eddelbuettel Feb 12 '14 at 14:38
  • I tried a standalone C program and it did work :-) I see more than 100% using top. It must be some issue with R. Didn't you have a similar issue as described here: http://lists.r-forge.r-project.org/pipermail/rcpp-devel/2012-June/003903.html – chris Feb 12 '14 at 14:44
  • Yes. There was something -- there is a post by Claudia Beleites here somewhere. Look for OpenBLAS and parallel. And I think someone (Simon?) wrote a package or function to set this.. – Dirk Eddelbuettel Feb 12 '14 at 14:46
  • I already spotted that! Unfortunately, it does not help. The weird thing is that it works fine on a Mac. – chris Feb 12 '14 at 14:57
  • Hi chris, can you try running the code at [this gist](https://gist.github.com/kevinushey/b61a984f71a7c9043fad) and tell me the results? I want to see if removing Rcpp from the equation changes the result; ie, if it's a pure R issue or may be related to Rcpp. – Kevin Ushey Feb 12 '14 at 19:29
  • I could make it run above 100% using the pure R code on the Red Hat server and OPENBLAS_MAIN_FREE=1 (without it would run at 100%). On the other server (CentOS) using a standard OpenBLAS install with OPENBLAS_MAIN_FREE=1 it still runs at 100%. – chris Feb 12 '14 at 20:01
  • Btw, OpenBLAS runs also above 100% with Rcpp on the Red Hat server. – chris Feb 12 '14 at 21:31
  • 1
    It's hard to say. I can only guess there is either an R specific setting, or session specific setting (something hiding in a `.bash_profile`, for example) that is getting in the way on CentOS. Not too sure how much more we can help from this side... – Kevin Ushey Feb 13 '14 at 05:59

1 Answers1

2

I found a solution by rebuilding R and configuring it using

../configure --enable-BLAS-shlib --enable-R-shlib --enable-memory-profiling --with-tcltk=no

Afterwards, I had to replace libRblas.so with the corresponding OpenBLAS file libopenblas.so. Btw, I build OpenBLAS with standard settings (i.e. with affinity). The R function qr() now uses all cores and the C++ programs as well. The reason why this works is that upon startup R is now launched with multiple threads (as verified with cat /proc/pid/status). Without replacing libRblas.so, R is launched with one thread and then upon calling OpenBLAS multiple threads are launched, which are properly pinned to the first core.

chris
  • 461
  • 2
  • 10
  • Well, doh, yes, of course. The Debian package (which I've been looking after for a decade+) *always* uses shared library builds so that you can swap Atlas / OpenBLAS, MKL, AMD's variant, ... in and out. If you had a built-in LAPACK/BLAS config you *obviously* could not go multi-core. This has **nothing** to do with Rcpp but is all about your R config, so shall we remove the Rcpp tag? – Dirk Eddelbuettel Feb 13 '14 at 14:32
  • I removed the Rcpp tag from your post. – Dirk Eddelbuettel Feb 13 '14 at 15:14