1

I wrote an R code to calculate the tail sum of a vector:

tailsum <- function(x){
   sum(x) + x - cumsum(x)
}

I hope to improve the efficiency of this function through RcppArmadillo, so I wrote

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

#include <Rcpp.h>
using namespace Rcpp;
using namespace arma;

// [[Rcpp::export]]
colvec tailsum_arma(const colvec &x){
  return sum(x) + x - cumsum(x);
}

NumericVector cumsum_self(const NumericVector &x){
  auto x_len = x.length();
  NumericVector y(x_len);
  y[0] = x[0];
  for(int i = 1;i < x_len; i++){
    y[i] = y[i - 1] + x[i];
  }
  return y;
}

// [[Rcpp::export]]
NumericVector tailsum_cpp(const NumericVector &x){
  //just to compare with tailsum_arma
  return sum(x) + x - cumsum_self(x);
}

But to my surprise, R code is more efficient than RcppArmadillo code:

> x <- rnorm(1000)
> microbenchmark(
+     tailsum(x),
+     tailsum_cpp(x),
+     tailsum_arma(x)
+ )
Unit: microseconds
            expr min  lq  mean median   uq  max neval cld
      tailsum(x) 2.0 2.3 2.826    2.5 2.70 14.4   100   a
  tailsum_cpp(x) 1.9 2.1 2.495    2.3 2.60  6.5   100   a
 tailsum_arma(x) 2.2 2.4 3.128    2.6 2.85 30.4   100   a

How can I improve my code written in RcppArmadillo?(I need to use RcppArmadillo because there are many other linear algebra operations that are done using RcppArmadillo.)

Chen Huang
  • 11
  • 1

1 Answers1

2

Interesting question, decent answer. We can improve a little more:

  • to be defensive, I recommend against using namespace ... and used explicit references "just to be sure"
  • no need to include Rcpp.h, RcppArmadillo.h does all we need (and it's a no-op)
  • your vector is too small to matter, I dialed up to 1e4, 1e5, 1e6 -- and then Armadillo sometimes ties R or beats it narrowly "in the mean"
  • as we see, "basic" R operations are direct and lean calls to compiled code so you do not necessarily "beat them" (unless you use OpenMP and other tricks)
  • why not combine both loops from tailsum_cpp and tailsum_self into one function to save calling overhead -- that ends up fastest
  • still, I would almost always pick your first approach with Armadillo functions in a one-liner

Here is what I get with the small changes and larger vector, more runs:

Unit: milliseconds
  expr     min      lq    mean  median      uq     max neval cld
     r 3.09441 3.58882 7.46590 5.42797 6.79353 181.518   500   b
  rcpp 3.57488 4.16185 8.15100 5.83417 7.26449 146.467   500   b
  arma 3.09324 3.77172 7.63546 5.50456 8.21274 221.112   500   b
 combo 2.72539 2.87299 4.58357 3.16753 4.95350 109.923   500  a 
> 

Code below

#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::colvec tailsum_arma(const arma::colvec &x){
    return arma::sum(x) + x - arma::cumsum(x);
}

Rcpp::NumericVector cumsum_self(const Rcpp::NumericVector &x){
    auto x_len = x.length();
    Rcpp::NumericVector y(x_len);
    y[0] = x[0];
    for (int i = 1;i < x_len; i++){
        y[i] = y[i - 1] + x[i];
    }
    return y;
}

// [[Rcpp::export]]
Rcpp::NumericVector tailsum_cpp(const Rcpp::NumericVector &x){
    //just to compare with tailsum_arma
    return sum(x) + x - cumsum_self(x);
}

// [[Rcpp::export]]
Rcpp::NumericVector tailsum_combo(const Rcpp::NumericVector &x){
    size_t x_len = x.length();
    double x_sum = Rcpp::sum(x);
    double csum = 0.0;
    Rcpp::NumericVector y(x_len);
    for (size_t i = 0; i < x_len; i++) {
        csum += x[i];
        y[i] = x_sum - csum + x[i];
    }
    return y;
}

/*** R

# But to my surprise, R code is more efficient than RcppArmadillo code:

tailsum_r <- function(x){
   sum(x) + x - cumsum(x)
}

x <- rnorm(1e6)
microbenchmark::microbenchmark(r = tailsum_r(x),
                               rcpp = tailsum_cpp(x),
                               arma = tailsum_arma(x),
                               combo = tailsum_combo(x),
                               times = 500)

*/

PS I had to remove the auto in the "combo" function as the compiler did some funky stuff with some of the returns. Here, for once, it helped to be explicit.

Dirk Eddelbuettel
  • 360,940
  • 56
  • 644
  • 725
  • Thanks for your detailed answer and useful tips of C++ which help me a lot!@Dirk Eddelbuettel – Chen Huang Apr 04 '21 at 14:52
  • @ChenHuang: You are welcome! Feel free to "upvote" (click on up-triangle) and/or "accept" (click on tickmark only you as question asker see) to signal that the answer is helpful / answers your question. That is StackOverflow signals and rewards between answers. – Dirk Eddelbuettel Apr 04 '21 at 15:15