9

I am developing some functions in Rcpp that operate on big.matrix objects from the bigmemory package. These objects are passed to Rcpp as SEXP objects which i then have to cast to an XPtr<BigMatrix>, and then to a MatrixAccessor object to access elements of the matrix.

For example, if I want to implement a function that get's the diagonal:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::depends(BH, bigmemory)
#include <bigmemory/MatrixAccessor.hpp>

#include <numeric>

// [[Rcpp::export]]
NumericVector GetDiag(SEXP pmat) {
  XPtr<BigMatrix> xpMat(pMat); // allows you to access attributes
  MatrixAccessor<double> mat(*xpMat); // allows you to access matrix elements

  NumericVector diag(xpMat->nrow()); // Assume matrix is square
  for (int i = 0; i < xpMat->nrow(); i++) { 
    diag[i] = mat[i][i];
  }
  return diag;
}

This function works beautifully, provided the big.matrix object in R is filled with doubles.

However, if you call this function on an integer matrix (e.g. diag(as.big.matrix(matrix(1:9, 3))@address)), You get garbage as a result, because the MatrixAccessor has been initialized as <double>.

Internally, big.matrix objects can come in four types:

void typeof(SEXP pMat) {
  XPtr<BigMatrix> xpMat(pMat);
  int type = xpMat->matrix_type();
  type == 1 // char
  type == 2 // short
  type == 4 // int
  type == 8 // double
}

Since all we're doing is accessing the elements of the matrix, the diag function should be able to handle each of these types. But for now, since our function signature is NumericVector, I'll ignore character matrices.

To handle this, I figured I could just throw in a switch statement, initializing the corresponding mat with the appropriate type at runtime:

// [[Rcpp::export]]
NumericVector GetDiag(SEXP pmat) {
  XPtr<BigMatrix> xpMat(pMat);
  // Determine the typeof(pmat), and initialize accordingly:
  switch(xpMat->matrix_type()) {
    case == 1:
    {
       // Function expects to return a NumericVector.
       throw; 
    }
    case == 2:
    {
      MatrixAccessor<short> mat(*xpMat);
      break;
    }
    case == 4:
    {
      MatrixAccessor<int> mat(*xpMat);
      break;
    }
    case == 8:
    {
      MatrixAccessor<double> mat(*xpMat);
    }
  }
  MatrixAccessor<double> mat(*xpMat); // allows you to access matrix elements

  NumericVector diag(xpMat->nrow()); // Assume matrix is square
  for (int i = 0; i < xpMat->nrow(); i++) { 
    diag[i] = mat[i][i];
  }
  return diag;
}

However, this results in compiler errors, because I'm redefining mat after it has been declared already in the first case.

The only way I can see to do this, is to write three different diag functions, one for each type, whose code is the same with the exception of the initialization of mat. Is there a better way?

Scott Ritchie
  • 10,293
  • 3
  • 28
  • 64

2 Answers2

10

In these cases, you usually want to separate the logic: first, you have a dispatch function that marshalls the SEXP type into some compile-time type, and a separate (templated) function that handles the real work. Some (incomplete) example code:

#include <Rcpp.h>
using namespace Rcpp;

// the actual generic implementation
template <typename T>
T GetDiag_impl(T const& pMat) {
  // logic goes here
}

// the dispatch code

// [[Rcpp::export]]
SEXP GetDiag(SEXP pMat) {
  switch (TYPEOF(pMat)) {
  case INTSXP: return GetDiag_impl<IntegerMatrix>(pMat);
  case REALSXP: return GetDiag_impl<NumericMatrix>(pMat);
  <...>
  }
}
Kevin Ushey
  • 20,530
  • 5
  • 56
  • 88
  • Thanks for the pointers, I got there in the end. The template I needed to implement was very different to your answer, so I've added the general design pattern as an answer, and will write an addition to the Rcpp Gallery example for bigmemory. – Scott Ritchie Jul 08 '14 at 06:22
  • 1
    Hey Kevin... I just realized that I'm unable to split the bounty between two users. I intended to award you 50 and @ScottRitchie 50 as both of your post for extremely helpful. Anywho, I'll open another bounty and directly award it to you after the 1 day mandatory waiting period. – Joseph Wood Nov 15 '17 at 17:21
  • Well... there it is!!! 4x the amount I initially intended. I don't do this reluctantly, as the knowledge I've gained from this post alone is worth more than a few SO points. – Joseph Wood Nov 15 '17 at 22:13
9

Thanks to the pointers from Kevin Ushey on separating the dispatching and the function logic. The template code required is sufficiently different to Kevin's suggestions to warrant its own answer.

To write a function that generally works on all types of big.matrix, the pattern is as follows:

#include <Rcpp.h>
using namespace Rcpp;

// [[Rcpp::depends(BH, bigmemory)]]
#include <bigmemory/MatrixAccessor.hpp>

// Generic Implementation of GetDiag:
template <typename T>
NumericVector GetDiag_impl(XPtr<BigMatrix> xpMat, MatrixAccessor<T> mat) {
  NumericVector diag(xpMat->nrow()); // Assume matrix is square
  for (unsigned int i = 0; i < xpMat->nrow(); i++) { 
    diag[i] = mat[i][i];
  }
  return diag;
}

// Dispatch code
// [[Rcpp::export]]
NumericVector GetDiag(SEXP pMat) {
  XPtr<BigMatrix> xpMat(pMat);
  switch(xpMat->matrix_type()) {
    case 1:
      return GetDiag_impl(xpMat, MatrixAccessor<char>(*xpMat));
    case 2:
      return GetDiag_impl(xpMat, MatrixAccessor<short>(*xpMat));
    case 4:
      return GetDiag_impl(xpMat, MatrixAccessor<int>(*xpMat));
    case 8:
      return GetDiag_impl(xpMat, MatrixAccessor<double>(*xpMat));
    default:
      // This should be impossible to reach, but shuts up the compiler
      throw Rcpp::exception("Unknown type of big.matrix detected! Aborting.");
  }
}

You do not need to template the return type of GetDiag_impl: all four big.matrix types are stored as numeric in R (See This Answer for a discussion on 'char' big.matrix objects).

Community
  • 1
  • 1
Scott Ritchie
  • 10,293
  • 3
  • 28
  • 64
  • 1
    Thanks for this, and for moving it over to Rcpp Gallery as amendment to the [bigmemory post](http://gallery.rcpp.org/articles/using-bigmemory-with-rcpp/). – Dirk Eddelbuettel Jul 11 '14 at 14:15