Why is my Rcpp mean function slower than R?

Question

I want to create a C++ function that raises each element in x to power and takes the mean. I've created three versions:

power_mean_R: R solution -- mean(x^power)
power_mean_C: C++ solution
power_mean_C_2arg: C++ solution with extra power argument

The extra power argument seems to drastically slow down the function to the point that it's slower than the R implementation. Is this a reality of using Rcpp or is there something I can improve in my code?

#library(Rcpp)

cppFunction(
    'double power_mean_C_2arg(NumericVector x, double power) {

        int n = x.size();
        double total = 0;

        for(int i=0; i<n; ++i) {
            total += pow(x[i], power);
        }

        return total / n;

    }'
)

cppFunction(
    'double power_mean_C(NumericVector x) {

        int n = x.size();
        double total = 0;

        for(int i=0; i<n; ++i) {
            total += pow(x[i], 2);
        }

        return total / n;

    }'
)

power_mean_R <- function(x, power) {
    mean(x^power)
}

bench::mark(
    R = power_mean_R(1:100, p = 2),
    C = power_mean_C(1:100),
    C2arg = power_mean_C_2arg(x = 1:100, p = 2)
)

  expression    min median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory
  <bch:expr> <bch:> <bch:>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>
1 R          5.91µs 6.91µs   112386.    1.27KB      0   10000     0       89ms <dbl … <Rpro…
2 C          2.87µs 3.54µs   231281.    3.32KB     23.1  9999     1     43.2ms <dbl … <Rpro…
3 C2arg      6.02µs 6.89µs   116187.    3.32KB      0   10000     0     86.1ms <dbl … <Rpro…

I'm new to `Rcpp` so any general feedback is helpful as well, even `power_mean_C` seems a little slower than I expected — Jeff Bezos, Dec 10 '20 at 21:25
1. The R function is already as efficient as it gets and screams _already vectorised_ at you. 2. You can win a little with fewer safety check but that is not recommended. 3. You could win more via OpenMP but that is more work. In short, so a benchmark to score an easy win. — Dirk Eddelbuettel, Dec 10 '20 at 21:29
Also notice that `^` and `mean` are effectively just wrappers round `.Primitive` and `.Internal` functions, which are C/C++ functions. — DanY, Dec 10 '20 at 21:33
Daniel, I have discussed ("debated"?) this similar topic with Dirk (many years ago), and he "won" and has been proven correct so many times since then. While I understood the adage *"Premature Optimization Is the Root of All Evil"*, I did not really put it to practice (well), thinking that I could fix everything by going straight to `Rcpp`. In short, Rcpp *is awesome*, but it cannot really improve this kind of thing within R. And by "this kind of thing", I mean `.Primitive` functions that have stood the test of time and are written/vectorized really well. — r2evans, Dec 10 '20 at 21:39
(My next question back at you would be: why are you trying to compile and improve *this functionality*? There are plenty of other math-y things out there that R does "okay" and can be done (sometimes "much") faster in Rcpp, but not vanilla power/mean.) — r2evans, Dec 10 '20 at 21:41
I'm using this function to aggregate groups in a large data.table: `DT[, lapply(.SD, function(x) {mean(x ^ (1 / power))}), by = group]` — Jeff Bezos, Dec 10 '20 at 21:44
You are *really* starting from a bit of a self-dug hole. `data.table` is rather famously optimised and in general you will not get better by just tossing a (naive) Rcpp function in the mix. — Dirk Eddelbuettel, Dec 11 '20 at 04:14
I would suggest on creating a new specific question on the operation that you are trying to optimize. If I am correct, your approach (`{mean(x ^ (1 / power))}`) doesn't use `data.table`s optimizations for `mean` calculation, so maybe there's some room for improvements. — minem, Dec 11 '20 at 08:37

score 3 · Accepted Answer · answered Dec 11 '20 at 00:19

There are few things that handicap your C++ function with the examples you gave

1:100 is an ALTREP sequence, for which a highly optimized sum method exists which is much faster. In the below extreme examples, it's over 6 million times faster. Obviously the vector is not altrep all the way through, but it's a bad idea to benchmark on altrep sequences.

billion <- c(1L, 2:1e9)
billalt <- 1:1e9

identical(billion, billalt)
#> [1] TRUE

bench::mark(sum(billion), sum(billalt))
#> # A tibble: 2 x 10
#>   expression             min            mean          median          max
#>   <chr>             <bch:tm>        <bch:tm>        <bch:tm>     <bch:tm>
#> 1 sum(billi~ 614564900.000ns 614564900.000ns 614564900.000ns 614564.900us
#> 2 sum(billa~       100.000ns       312.530ns       200.000ns     23.300us
#> # ... with 5 more variables: `itr/sec` <dbl>, mem_alloc <bch:byt>, n_gc <dbl>,
#> #   n_itr <int>, total_time <bch:tm>

^{Created on 2020-12-11 by the reprex package (v0.3.0)}

Second, 1:100 is an integer vector but your Rcpp function accepts a numeric vector, so the data must be coerced to type double before any operations are done. For a vector so small, it's likely to be a considerable part of the overhead.

Your test vector is very small so overheads like Rcpp's preservation of random seeds are going to dominate the differences.

Neither point 1 nor 2 two really has any bearing on the difference between OP's three methods. Moreover, point 2 is a conjecture you could have tested as there is a switch to not preserve RNG values. Lastly, `sum()` is not really OPs problem per his last comment. — Dirk Eddelbuettel, Dec 11 '20 at 04:12

Why is my Rcpp mean function slower than R?

1 Answers1

Linked