Rcpp - extracting rows from list of matrices / dataframes

Question

As a follow up to this question, I've decided to go down the route of Rcpp vs convoluted syntax in R. I think this will provide better readability (and possibly also be faster).

Let's say I have a list of data.frames (which I can easily convert to matrices via as). Given prior answe -r -s, this seems the best approach.

# input data
my_list <- vector("list", length= 10)
set.seed(65L)
for (i in 1:10) {
  my_list[[i]] <- data.frame(matrix(rnorm(10000),ncol=10))
  # alternatively 
  # my_list[[i]] <- matrix(rnorm(10000),ncol=10)
}

What's the appropriate way to extract rows from the matrices? The goal is to create a list with each list element containing a list of the nrth row of each of the original list's data.frames. I've tried several different syntaxes and keep getting errors:

#include <Rcpp.h>
using namespace Rcpp;
using namespace std:

List foo(const List& my_list, const int& n_geo) {
  int n_list = my_list.size();
  std::vector<std::vector<double> > list2(n_geo);

  // needed code....

  return wrap(list2);
}

options

for (int i = 0; i < n_list; i++) {
  for (int nr = 0; nr < n_geo; nr++) {
    list2[nr][i] = my_list[i].row(nr);
    // or list2[nr].push_back(my_list[i].row(nr));
    // or list2[nr].push_back(as<double>(my_list[i].row(nr)));
    // or list2[nr].push_back(as<double>(my_list[i](nr, _)));
  }
}

// or:
NumericMatrix a = my_list[1] 
... 
NumericMatrix j = my_list[10]

for (int nr = 0; nr < n_geo; nr++) {
  list2[nr][1] = // as above
}

None of these are working for me. What am I doing wrong? Here are the errors I receive from my above syntax choices.

error: no matching function for call to 'as(Rcpp::Matrix<14>::Row)'

or

error: cannot convert 'Rcpp::Matrix<14>::Row {aka Rcpp::MatrixRow<14>}' to 'double' in assignment

Your question is a little unclear to me. Can show example R objects for your input (corresponding to `my_list`) and desired output? — nrussell, Mar 14 '16 at 19:42
So you are trying to write the operation that creates `l2` in your other question, using Rcpp? — nrussell, Mar 14 '16 at 20:36

nrussell · Accepted Answer · 2016-03-14T22:28:41.183

Here is one way to do it:

#include <Rcpp.h>

// x[[nx]][ny,]  ->  y[[ny]][[nx]]

// [[Rcpp::export]]
Rcpp::List Transform(Rcpp::List x) {
    R_xlen_t nx = x.size(), ny = Rcpp::as<Rcpp::NumericMatrix>(x[0]).nrow();
    Rcpp::List y(ny);

    for (R_xlen_t iy = 0; iy < ny; iy++) {
        Rcpp::List tmp(nx);
        for (R_xlen_t ix = 0; ix < nx; ix++) {
            Rcpp::NumericMatrix mtmp = Rcpp::as<Rcpp::NumericMatrix>(x[ix]);
            tmp[ix] = mtmp.row(iy);
        }
        y[iy] = tmp;
    }

    return y;
}

/*** R

L1 <- lapply(1:10, function(x) {
    matrix(rnorm(20), ncol = 5)
})

L2 <- lapply(1:nrow(L1[[1]]), function(x) {
    lapply(L1, function(y) unlist(y[x,]))
})

all.equal(L2, Transform(L1))
#[1] TRUE

microbenchmark::microbenchmark(
    "R" = lapply(1:nrow(L1[[1]]), function(x) {
        lapply(L1, function(y) unlist(y[x,]))
    }),
    "Cpp" = Transform(L1),
    times = 200L)

#Unit: microseconds
#expr    min      lq      mean  median       uq      max neval
#  R 254.660 316.627 383.92739 347.547 392.7705 1909.097   200
#Cpp  18.314  26.007  71.58795  30.230  38.8650  945.167   200

*/

I'm not sure how this will scale; I think it is just an inherently inefficient transformation. As per my comment at the top of the source, it seems like you are just doing a sort of coordinate swap -- the nyth row of the nxth element of the input list becomes the nxth element of the nyth element of the output list:

x[[nx]][ny,]  ->  y[[ny]][[nx]]

To address the errors you were getting, Rcpp::List is a generic object - technically an Rcpp::Vector<VECSXP> - so when you try to do, e.g.

my_list[i].row(nr)

the compiler doesn't know that my_list[i] is a NumericMatrix. Therefore, you have to make an explicit cast with Rcpp::as<>,

Rcpp::NumericMatrix mtmp = Rcpp::as<Rcpp::NumericMatrix>(x[ix]);
tmp[ix] = mtmp.row(iy);

I just used matrix elements in the example data to simplify things. In practice you are probably better off coercing data.frames to matrix objects directly in R than trying to do it in C++; it will be much simpler, and most likely, the coercion is just calling underlying C code, so there isn't really anything to be gained trying to do it otherwise.

I should also point out that if you are using a Rcpp::List of homogeneous types, you can squeeze out a little more performance with Rcpp::ListOf<type>. This will allow you to skip the Rcpp::as<type> conversions done above:

typedef Rcpp::ListOf<Rcpp::NumericMatrix> MatList;

// [[Rcpp::export]]
Rcpp::List Transform2(MatList x) {
    R_xlen_t nx = x.size(), ny = x[0].nrow();
    Rcpp::List y(ny);

    for (R_xlen_t iy = 0; iy < ny; iy++) {
        Rcpp::List tmp(nx);
        for (R_xlen_t ix = 0; ix < nx; ix++) {
            tmp[ix] = x[ix].row(iy);
        }
        y[iy] = tmp;
    }

    return y;
}

/*** R

L1 <- lapply(1:10, function(x) {
    matrix(rnorm(20000), ncol = 100)
})

L2 <- lapply(1:nrow(L1[[1]]), function(x) {
    lapply(L1, function(y) unlist(y[x,]))
})

microbenchmark::microbenchmark(
    "R" = lapply(1:nrow(L1[[1]]), function(x) {
        lapply(L1, function(y) unlist(y[x,]))
    }),
    "Transform" = Transform(L1),
    "Transform2" = Transform2(L1),
    times = 200L)

#Unit: microseconds
#      expr      min       lq     mean   median       uq       max neval
#         R 6049.594 6318.822 7604.871 6707.242 8592.510 64005.190   200
# Transform  928.468 1041.936 3130.959 1166.819 1659.745 71552.284   200
#Transform2  850.912  957.918 1694.329 1061.183 2856.724  4502.065   200

*/

Thanks for the continued edits. I'm getting a ~11x speed up vs my naive R method, which is better than the ~8.5x speed up via the prior solution from sgibbs... And, as originally noted, the readability is substantially improved. — alexwhitworth, Mar 14 '16 at 22:43

Rcpp - extracting rows from list of matrices / dataframes

1 Answers1