1

I'm having a hard time implementing a function with the Rcpp module using cppFunction. I need to use something like R's intersect with two NumericVector types and return another NumericVector with the result, just like in R.

This document has been of some help but unfortunately I'm pretty much a noob in C++ atm.

How could I implement the intersect R function with cppFunction ?

Thanks

Pane
  • 555
  • 2
  • 7
  • 20

1 Answers1

6

You would probably want to use something like the unordered_set to implement intersect:

File myintersect.cpp:

#include <Rcpp.h>
using namespace Rcpp;

// Enable C++11 via this plugin (Rcpp 0.10.3 or later)
// [[Rcpp::plugins(cpp11)]]

// [[Rcpp::export]]
NumericVector myintersect(NumericVector x, NumericVector y) {
    std::vector<double> res;
    std::unordered_set<double> s(y.begin(), y.end());
    for (int i=0; i < x.size(); ++i) {
        auto f = s.find(x[i]);
        if (f != s.end()) {
            res.push_back(x[i]);
            s.erase(f);
        }
    }
    return Rcpp::wrap(res);
}

We can load the function and verify it works:

library(Rcpp)
sourceCpp(file="myintersect.cpp")

set.seed(144)
x <- c(-1, -1, sample(seq(1000000), 10000, replace=T))
y <- c(-1, sample(seq(1000000), 10000, replace=T))
all.equal(intersect(x, y), myintersect(x, y))
# [1] TRUE

However, it seems this approach is a good deal less efficient than the itersect function:

library(microbenchmark)
microbenchmark(intersect(x, y), myintersect(x, y))
# Unit: microseconds
#               expr      min       lq   median        uq      max neval
#    intersect(x, y)  424.167  495.861  501.919  523.7835  989.997   100
#  myintersect(x, y) 1778.609 1798.111 1808.575 1835.1570 2571.426   100
josliber
  • 43,891
  • 12
  • 98
  • 133
  • 3
    Nice. FWIW we also have an ugly uppercase macro giving us `RCPP_UNORDERED_SET` irrespective of the compiler. But asking via the plugin is more elegant :) – Dirk Eddelbuettel Apr 11 '14 at 15:08
  • Ah nice! `std::tr1::unordered_set` wasn't working for me, so it's good to know `RCPP_UNORDERED_SET` exists. – josliber Apr 11 '14 at 15:19
  • 1
    In addition, there are sugar functions for this, e.g. `intersect` is already available -- but it returns output in a different order than `R`'s `intersect`. (You get the same values, though) – Kevin Ushey Apr 11 '14 at 18:03
  • 1
    `push_back` is killing it. Deep copies of all the data every time you use it. Don't `push_back` on Rcpp vectors, ever. FYI, `push_back` ... have been removed from Rcpp11. @KevinUshey sugar's version does even worse. – Romain Francois Apr 12 '14 at 06:06
  • @RomainFrancois thanks -- I followed [Dirk's advice](http://stackoverflow.com/questions/13782943/how-to-resize-a-numericvector) and converted it to a `std::vector`, but it seems to still be a good deal slower than `intersect`. – josliber Apr 12 '14 at 14:56