Rcpp/RcppArmadillo: removing non-contiguous elements from a vector based on positions

Question

Let's say I have a vector [2,4,6,8,10], and I need to remove the 2nd and the 4th elements from this vector. The desired resulting vector should be [2,6,10]. This is very easy to implement in R:

v1 <- c(2,4,6,8,10)
v1[-c(2,4)]

But how do I implement this in Rcpp/RcppArmadillo? I can figure out the contiguous case (i.e. removing the 2nd through the 4th elements) by using the .erase() function, but the non-contiguous case doesn't seem so obvious to me since .erase does not seem to accept uvec type of vectors. Speed could be a consideration because v1 could be quite large in my application.

EDIT: Either Rcpp or Armadillo implementation is fine by me as I am using both.

This should help: http://gallery.rcpp.org/articles/armadillo-subsetting/ — Dirk Eddelbuettel, Dec 31 '15 at 18:39
You are saying `Rcpp/RcppArmadillo`, but Rcpp vectors and armadillo vectors have different interfaces. Please clarify this. — nrussell, Dec 31 '15 at 18:42
@DirkEddelbuettel: I actually read that page before asking this question. Maybe I missed something but I still can't figure out how to remove elements instead of obtaining elements defined by a `uvec` or `umat`. Could you please elaborate? — aenima, Dec 31 '15 at 18:58
@nrussell: Good point. I have edited the original question to clarify on this. — aenima, Dec 31 '15 at 18:59
By the way it's R-**c**-**p**-**p**, not *Rccp* (c-p-p as in c-plus-plus). — nrussell, Dec 31 '15 at 19:01
@aenima Show us what you tried. We're not even to write code for you, even if nrussell is exceedingly generous with his time. — Dirk Eddelbuettel, Dec 31 '15 at 19:42
@DirkEddelbuettel Honestly I wasn't able to come up with anything even remotely working. Looking at nrussell's code, I realized the culprit was I didn't know that I could subset a vector using `LogicalVector` (although I know you can do this in R). I thought I could only use `uvec`. Had I known this I would have probably come up with something similar. I presume you are one of the devs for Rcpp and I am wondering: do you think this functionality (removing elements) should be implemented natively for Rcpp? It is after all a very commonly used trick in R... — aenima, Dec 31 '15 at 21:14
You are randomly mixing concepts and data types from Rcpp and RcppArmadillo. A _statically-typed language like C++_ requires a bit more care. To learn, maybe study existing and working examples, and modify them. — Dirk Eddelbuettel, Dec 31 '15 at 21:17

nrussell · Accepted Answer · 2015-12-31T23:16:10.077

2

Here's one possible approach:

#include <Rcpp.h>

Rcpp::LogicalVector logical_index(Rcpp::IntegerVector idx, R_xlen_t n) {
  bool invert = false; 
  Rcpp::LogicalVector result(n, false);

  for (R_xlen_t i = 0; i < idx.size(); i++) {
    if (!invert && idx[i] < 0) invert = true;
    result[std::abs(idx[i])] = true;
  }

  if (!invert) return result;
  return !result;
}


// [[Rcpp::export]]
Rcpp::NumericVector 
Subset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
  return x[logical_index(idx, x.size())];
}

x <- seq(2, 10, 2)

x[c(2, 4)]
#[1] 4 8
Subset(x, c(1, 3))
#[1] 4 8

x[-c(2, 4)]
#[1]  2  6 10
Subset(x, -c(1, 3))
#[1]  2  6 10

Note that the indices for the Rcpp function are 0-based, as they are processed in C++.

I abstracted the subsetting logic into its own function, logical_index, which converts an IntegerVector to a LogicalVector in order to be able to "decide" whether to drop or keep the specified elements (e.g. by inverting the result). I suppose this could be done with integer-based subsetting as well, but it should not matter either way.

Like vector subsetting in R, a vector of all negative indices means to drop the corresponding elements; whereas a vector of all positive indices indicates the elements to keep. I did not check for mixed cases, which should probably throw an exception, as R will do.

Regarding my last comment, it would probably be more sensible to rely on Rcpp's native overloads for ordinary subsetting, and have a dedicated function for negated subsetting (R's x[-c(...)] construct), rather than mixing functionality as above. There are pre-existing sugar expressions for creating such a function, e.g.

#include <Rcpp.h>

template <int RTYPE>
inline Rcpp::Vector<RTYPE> 
anti_subset(const Rcpp::Vector<RTYPE>& x, Rcpp::IntegerVector idx) {
  Rcpp::IntegerVector xi = Rcpp::seq(0, x.size() - 1);
  return x[Rcpp::setdiff(xi, idx)];
}

// [[Rcpp::export]]
Rcpp::NumericVector 
AntiSubset(Rcpp::NumericVector x, Rcpp::IntegerVector idx) {
  return anti_subset(x, idx);
}

/*** R

x <- seq(2, 10, 2)

x[-c(2, 4)]
#[1]  2  6 10

AntiSubset(x, c(1, 3))
#[1]  2  6 10

*/

edited Dec 31 '15 at 23:16

answered Dec 31 '15 at 19:29

nrussell

18,382
4
47
60

Thank you so much! I made some edits to your code to better reflect R's 1-indexing so that the input `idx` can be the same as R. Of course this is purely personal preference. I also had to change the line `LogicalVector result(n, false);` because the compiler would complain about it being ambiguous. I also added benchmarking results. This implementation is slower than R but unless the vector or `idx` is very large, the difference should be trivial. Which makes me wonder how R does it... – aenima Dec 31 '15 at 21:08
Well I left it as 0 based indexing on the premise that you needed to call the function from within C++. And yes calling this from R will most likely be slower than native indexing because that is a very primitive operation in R, predominantly handled by C code. But again, if you *need* to call this from C++ directly, it doesn't make sense to compare it with something executed in R, because the R subsetting won't be available to you from your C++ code. – nrussell Dec 31 '15 at 21:39
I agree with you that benchmarking against R is probably not necessary as I will be calling it within C++. Thanks again. – aenima Dec 31 '15 at 22:05
After some more testing of your code and mine, I do believe that there is an advantage to formatting the input `idx` as 1-based in `logical_index()`. Suppose we want to remove only the 1st element in `v1`. Using your code, you would have to say `Subset(v1, -0)`, but the result is actually obtaining the 1st element of `v1` because `logical_index()` has no way of knowing whether you want to obtain or remove `v1(0)` since 0 has no sign. – aenima Dec 31 '15 at 22:19
A fair point; but then again, you would probably be better off just having a dedicated function for negative indexing (dropping elements) altogether, rather than intermingling the two concepts (positive vs. negative indices & 0-based vs. 1-based indices). Rcpp already offers native support for (positive) index subsetting anyways, so you really only need to address the counterpart (dropping elements) in a separate function. – nrussell Dec 31 '15 at 22:27
Yes you are right. But I really like your code since it can do all kinds of indexing in a very similar fashion to R. I just thought the dropping-1st-element issue was worth mentioning because it is a potential problem for dropping elements in general. – aenima Dec 31 '15 at 23:26
Yes, but at the end of the day R is not C++ and you will inevitably have to make some syntactic trade-offs. Of course, the above was just the first approach that came to mind. You are more than free to use / modify as little or as much of it as you desire in your own code. – nrussell Dec 31 '15 at 23:32

caseyk · Answer 2 · 2016-08-23T15:13:51.960

Here's a function I wrote that does this task. Not using negative indices, but via a function call. It's a bit slower then the R function on my benchmarks (small vectors). Maybe someone can build on it, I haven't tested nrussel's code so this could be inferior. Edit - if you are passing an R vector containing indices for removal change the if statement from "inds(j) == i" to "inds(j)-1 == i" (I believe).

Note - The performance could be enhanced by putting a lower limit on the inner loop based on what indices are found. Assuming of course the index vector is sorted in ascending order.

arma::uvec rmIndices( unsigned int vecsize, arma::uvec inds){
  unsigned int negInds = 0, p = inds.size();
  bool foundMatch = false;
  arma::uvec neg_inds(vecsize - p);

  for(unsigned int i = 0; i < vecsize; ++i){

    for(unsigned int j = 0; j < p; ++j){
      if(inds(j) == i){//Check if we have a match between the index and the specified value
        foundMatch = true;
      }
    }//End inner loop
    if(!foundMatch){
      neg_inds(negInds) = i;
      negInds = negInds + 1;//We have a match so, go to next position.
    }
    foundMatch = false;
  }

  return( neg_inds );
}

Rcpp/RcppArmadillo: removing non-contiguous elements from a vector based on positions

2 Answers2

Linked