Building a tiny R package with CUDA and Rcpp

Question

I'm working on a tiny R package that uses CUDA and Rcpp, adapted from the output of Rcpp.package.skeleton(). I will first describe what happens on the master branch for the commit entitled "fixed namespace". The package installs successfully if I forget CUDA (i.e., if I remove the src/Makefile, change src/rcppcuda.cu to src/rcppcuda.cpp, and comment out the code that defines and calls kernels). But as is, the compilation fails.

I also would like to know how to compile with a Makevars or Makevars.in instead of a Makefile, and in general, try to make this as platform independent as is realistic. I've read about Makevars in the R extensions manual, but I still haven't been able to make it work.

Some of you may suggest rCUDA, but what I'm really after here is improving a big package I've already been developing for some time, and I'm not sure that switching is worth starting again from scratch.

Anyway, here's what happens when I do an R CMD build and R CMD INSTALL on this one (master branch, commit entitled "fixed namespace").

* installing to library ‘/home/landau/.R/library’
* installing *source* package ‘rcppcuda’ ...
** libs
** arch - 
/usr/local/cuda/bin/nvcc -c rcppcuda.cu -o rcppcuda.o --shared -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I/apps/R-3.2.0/include -I/usr/local/cuda/include 
rcppcuda.cu:1:18: error: Rcpp.h: No such file or directory
make: *** [rcppcuda.o] Error 1
ERROR: compilation failed for package ‘rcppcuda’
* removing ‘/home/landau/.R/library/rcppcuda’

...which is strange, because I do include Rcpp.h, and Rcpp is installed.

$ R

R version 3.2.0 (2015-04-16) -- "Full of Ingredients"
Copyright (C) 2015 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

...

> library(Rcpp)
> sessionInfo()
R version 3.2.0 (2015-04-16)
Platform: x86_64-unknown-linux-gnu (64-bit)
Running under: CentOS release 6.6 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] Rcpp_0.11.6
>

I'm using CentOS,

$ cat /etc/*-release
CentOS release 6.6 (Final)
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
CentOS release 6.6 (Final)
CentOS release 6.6 (Final)

CUDA version 6,

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2013 NVIDIA Corporation
Built on Thu_Mar_13_11:58:58_PDT_2014
Cuda compilation tools, release 6.0, V6.0.1

and I have access to 4 GPUs of the same make and model.

$ /usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery 
/usr/local/cuda/samples/bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 4 CUDA Capable device(s)

Device 0: "Tesla M2070"
  CUDA Driver Version / Runtime Version          6.0 / 6.0
  CUDA Capability Major/Minor version number:    2.0
  Total amount of global memory:                 5375 MBytes (5636554752 bytes)
  (14) Multiprocessors, ( 32) CUDA Cores/MP:     448 CUDA Cores
  GPU Clock rate:                                1147 MHz (1.15 GHz)
  Memory Clock rate:                             1566 Mhz
  Memory Bus Width:                              384-bit
  L2 Cache Size:                                 786432 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(65536), 2D=(65536, 65535), 3D=(2048, 2048, 2048)
  Maximum Layered 1D Texture Size, (num) layers  1D=(16384), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(16384, 16384), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 32768
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (65535, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Enabled
  Device supports Unified Addressing (UVA):      Yes
  Device PCI Bus ID / PCI location ID:           11 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

...

> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU0) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU3) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU1) : No
> Peer access from Tesla M2070 (GPU1) -> Tesla M2070 (GPU2) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU2) -> Tesla M2070 (GPU2) : No
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU0) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU1) : Yes
> Peer access from Tesla M2070 (GPU3) -> Tesla M2070 (GPU2) : Yes

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 6.0, CUDA Runtime Version = 6.0, NumDevs = 4, Device0 = Tesla M2070, Device1 = Tesla M2070, Device2 = Tesla M2070, Device3 = Tesla M2070
Result = PASS

Edit: it compiles for any commit after "fixed namespace" on either branch, but there are still problems with combining Rcpp and CUDA

To make the package compile, it turns out that I just needed to separate my C++ and CUDA code into separate *.cpp and *.cu files. However, when I try the "compiling cpp and cu separately" commit on the master branch, I get

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

Error in .Call("someCPPcode", r) : 
  "someCPPcode" not resolved from current namespace (rcppcuda)
>

The error goes away in the withoutCUDA branch in the commit entitled "adding branch withoutCUDA".

> library(rcppcuda)
> hello()
An object of class "MyClass"
Slot "x":
 [1]  1  2  3  4  5  6  7  8  9 10

Slot "y":
 [1]  1  2  3  4  5  6  7  8  9 10

[1] "Object changed."
An object of class "MyClass"
Slot "x":
 [1] 500   2   3   4   5   6   7   8   9  10

Slot "y":
 [1]    1 1000    3    4    5    6    7    8    9   10

>

The only differences between the "compiling cpp and cu separately" commit on master and the "adding branch withoutCUDA" commit on withoutCUDA are

The Makefile and someCUDAcode.cu are gone from withoutCUDA.
In withoutCUDA, all references to someCUDAcode() are gone from someCPPcode.cpp.

Also, it would still be convenient be able to use CUDA and Rcpp in the same *.cu file. I would really like to know how to fix the "fixed namespace" commit on the master branch.

cdeterman · Accepted Answer · 2015-06-08T19:56:35.730

Going through your package there are multiple aspects that need to be changed.

You shouldn't use a 'Makefile' but a 'Makevars' file instead to improve compatibility for multiple architecture builds.
Try to follow the standard variable names (e.g. CPPC should be CXX), this makes everything play together much better.
Don't try and compile the shared object yourself, there are good macros within the base R makefile that make this much simpler (e.g. PKG_LIBS, OBJECTS, etc.)
With multiple compilers, you will want to use the OBJECTS macro. Here you will override R's base attempt to set the object files to be linked (make sure you include them all).
You also need (AFAIK) to make CUDA functions available with extern "C". You will prefix both the function in the .cu file and when you declare it at the start of your cpp file.

The following Makevars worked for me whereby I modified my CUDA_HOME, R_HOME, and RCPP_INC (switched back for you). Note, this is where a configure file is recommended to make the package as portable as possible.

CUDA_HOME = /usr/local/cuda
R_HOME = /apps/R-3.2.0
CXX = /usr/bin/g++

# This defines what the shared object libraries will be
PKG_LIBS= -L/usr/local/cuda-7.0/lib64 -Wl,-rpath,/usr/local/cuda-7.0/lib64 -lcudart -d


#########################################

R_INC = /usr/share/R/include
RCPP_INC = $(R_HOME)/library/Rcpp/include

NVCC = $(CUDA_HOME)/bin/nvcc
CUDA_INC = $(CUDA_HOME)/include 
CUDA_LIB = $(CUDA_HOME)/lib64

LIBS = -lcudart -d
NVCC_FLAGS = -Xcompiler "-fPIC" -gencode arch=compute_20,code=sm_20 -gencode arch=compute_30,code=sm_30 -gencode arch=compute_35,code=sm_35 -I$(R_INC)

### Define objects
cu_sources := $(wildcard *cu)
cu_sharedlibs := $(patsubst %.cu, %.o,$(cu_sources))

cpp_sources := $(wildcard *.cpp)
cpp_sharedlibs := $(patsubst %.cpp, %.o, $(cpp_sources))

OBJECTS = $(cu_sharedlibs) $(cpp_sharedlibs)

all : rcppcuda.so

rcppcuda.so: $(OBJECTS)

%.o: %.cpp $(cpp_sources)
        $(CXX) $< -c -fPIC -I$(R_INC) -I$(RCPP_INC)

%.o: %.cu $(cu_sources)
        $(NVCC) $(NVCC_FLAGS) -I$(CUDA_INC) $< -c

A follow-up point (as you say this is a learning exercise):

A. You aren't using one of the parts of Rcpp that make it such a wonderful package, namely 'attributes'. Here is how your cpp file should look:

#include <Rcpp.h>
using namespace Rcpp;

extern "C"
void someCUDAcode();

//[[Rcpp::export]]
SEXP someCPPcode(SEXP r) {
  S4 c(r);
  double *x = REAL(c.slot("x"));
  int *y = INTEGER(c.slot("y"));
  x[0] = 500.0;
  y[1] = 1000;
  someCUDAcode();
  return R_NilValue;
}

This will automatically generate the corresponding RcppExports.cpp and RcppExports.R files and you no longer need a .Call function yourself. You just call the function. Now .Call('someCPPcode', r) becomes someCPPcode(r) :)

For completeness, here is the updated someCUDAcode.cu file:

__global__ void mykernel(int a){
  int id = threadIdx.x;
  int b = a;
  b++;
  id++;
}


extern "C"
void someCUDAcode() {
  mykernel<<<1, 1>>>(1);
}

With respect to a configure file (using autoconf), you are welcome to check out my gpuRcuda package using Rcpp, CUDA, and ViennaCL (a C++ GPU computing library).

This solves both problems I was after: using Makevars correctly and using Rcpp with CUDA. Thanks! — landau, Jun 09 '15 at 15:03
I should mention, though, that making the changes to allow `someCPPcode(r)` instead of `.Call('someCPPcode', r)` didn't work for me. I couldn't access the `someCPPcode` function within R. — landau, Jun 09 '15 at 15:14
hmm... check `RcppExports.R` to make sure the function has been created. You may need to run `Rcpp::compileAttributes()` manually (you shouldn't have to though). This worked perfectly for me after cloning your repo. What error are you receiving? — cdeterman, Jun 09 '15 at 15:18
`library(rcppcuda); hello()` resulted in `Error in hello() : could not find function "someCPPcode"` because I had no RcppExports* files. But after I manually called `Rcpp::compileAttributes()`, `hello()` ran just fine. When does `compileAttributes()` get called automatically? During the installation? — landau, Jun 09 '15 at 18:06
One last issue: when I do an `R CMD check` on this package (and the larger one I'm seriously working on), I get `* checking if this is a source package ... WARNING Subdirectory ‘src’ contains: someCUDAcode.cu These are unlikely file names for src files.` How do I tell R that *.cu files are source files? — landau, Jun 09 '15 at 18:07
@landau, `compileAttributes()` is called only during development (AFAIK). Not sure if you are developing in Rstudio or not but typically it is called with each re-building of your package in Rstudio but good sanity check to call yourself. The result export files are part of the package so it isn't called during installation. Second, the warning about `cu` files is likely unavoidable as R is telling you it doesn't recognize them, intended to prevent garbage files. R just hasn't been designed with CUDA in mind. — cdeterman, Jun 09 '15 at 18:34
Now that you mention it, I think would rather call `compileAttributes()` myself so I can add `roxygen2` documentation at the top of the `RcppExports*` files. As for the `*.cu` file warning, I think I'll write a separate post about it in case anyone knows how to make it go away. I want to submit to Bioconductor, and they require a completely clean `R CMD check`. Thanks for all your help! — landau, Jun 10 '15 at 03:04

score 2 · Answer 2 · answered Jun 03 '15 at 21:17

2

Several packages on CRAN use GPUs via CUDA:

I would start with these.

answered Jun 03 '15 at 21:17

Dirk Eddelbuettel

360,940
56
644
725

I agree that these are good examples of combining R and CUDA, but none uses `Rcpp`, and each relies on a Makefile or Makefile.in rather than a Makevars. [`WideLM`](http://cran.r-project.org/web/packages/WideLM/index.html) does use both `Rcpp` and CUDA, but [it won't install on the machine I'm using](http://stackoverflow.com/questions/30631612/widelm-installation-error), so I don't know how much good it will do to dig into the source at this point. – landau Jun 03 '15 at 22:08
Shucks. Good point. Forgot about WideLM. Its author has a new package for Rcpp, RcppArmadillo and CUDA forthcoming so you may want to get in touch. Tell Mark I said Hi :) – Dirk Eddelbuettel Jun 03 '15 at 22:09

Building a tiny R package with CUDA and Rcpp

2 Answers2

Linked