Compiled R code is actually slower than pure R with JIT enabled

Question

From Efficient R programming the byte compiler and R docment r byte compiler, I learnt that cmpfun can be used to compile a pure R function into byte code to speed and enableJIT can speed up by enabling just-in-time compilation.

So, I decided to do the benchmark just like the first link myself using the following code:

library("compiler")
library("rbenchmark")

enableJIT(3)

my_mean = function(x) {
    total = 0
    n = length(x)
    for (each in x)
        total = total + each
    total / n
}

cmp_mean = cmpfun(my_mean, list(optimize = 3))

## Generate some data
x = rnorm(100000)
benchmark(my_mean(x), cmp_mean(x), mean(x), columns = c("test", "elapsed", "relative"), order = "relative", replications = 5000)

Unfortunately, the result is no where like what the first link has shown. The performance of my_mean is even better than cmp_mean:

         test elapsed relative
3     mean(x)   1.468    1.000
1  my_mean(x)  35.402   24.116
2 cmp_mean(x)  36.817   25.080

I can't figure out what has happened.

Edit:

The R version on my computer is 3.5.2.

The operating system debian 9.8. Every software on my computer is up-to-date with the stable source provided by debian.

linux kernel version 4.9.0-8-amd64.

Eidt5:

I rewrote the scripts to test different combination of optimize and JIT:

#!/usr/bin/env Rscript

library("compiler")
library("microbenchmark")
library("rlist")

my_mean = function(x) {
    total = 0
    n = length(x)
    for (each in x)
    total = total + each
    total / n
}

do_cmpfun = function(f, f_name, optimization_level) {
    cmp_f = cmpfun(f, list(optimize = optimization_level))
    list(cmp_f, f_name, optimize = optimization_level)
}

do_benchmark = function(f, f_name, optimization_level, JIT_level, x) {
    result = summary(microbenchmark(f(x), times = 1000, unit = "us", control = list(warmup = 100)))
    data.frame(fun = f_name, optimize = optimization_level, JIT = JIT_level, mean = result$mean)
}

means = list(list(mean, "mean", optimize = -1), list(my_mean, "my_mean", optimize = -1))

for (optimization_level in 0:3)
    means = list.append(means, do_cmpfun(my_mean, "my_mean", optimization_level))

# Generate some data
x = rnorm(100000)

# Benchmark in different JIT levels
result = c()
for (JIT_level in 0:3) {
    enableJIT(JIT_level)

    for (f in means) {
    result = rbind(result, do_benchmark(f[[1]], f[[2]], f[[3]], JIT_level, x))
    }
}


# Sort result
sorted_result = result[order(result$mean), ]
rownames(sorted_result) = NULL

print("Unit = us, optimize = -1 means it is not processed by cmpfun")
print(sorted_result)

I ran sudo cpupower frequency-set --governor performance before running the R script and got this:

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   2   229.1841
2     mean       -1   1   229.3910
3     mean       -1   3   236.3680
4     mean       -1   0   252.9416
5  my_mean       -1   2  5242.0413
6  my_mean        3   0  5279.9710
7  my_mean        2   2  5297.5323
8  my_mean        2   1  5327.0324
9  my_mean       -1   1  5333.6941
10 my_mean        3   1  5336.4559
11 my_mean        2   0  5362.6644
12 my_mean        3   3  5410.1963
13 my_mean        2   3  5414.4616
14 my_mean       -1   3  5418.3823
15 my_mean        3   2  5437.3233
16 my_mean        1   2  9947.7897
17 my_mean        1   1 10101.6464
18 my_mean        1   3 10204.3253
19 my_mean        1   0 10323.0782
20 my_mean        0   0 26557.3808
21 my_mean        0   2 26728.5222
22 my_mean       -1   0 26901.4200
23 my_mean        0   3 26984.5200
24 my_mean        0   1 27060.6188

However, after I update-alternatived libblas.so.3 and liblapack.so.3 to the one provided by openblas 0.2.19-3, my_mean with optimize = 3 and JIT = 0 becomes the one that has the best performance (excluding mean):

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   0   228.9361
2     mean       -1   1   229.1223
3     mean       -1   2   233.9757
4     mean       -1   3   241.7835
5  my_mean        3   0  5246.8089
6  my_mean       -1   1  5261.3951
7  my_mean       -1   2  5330.6310
8  my_mean        2   3  5362.2055
9  my_mean        3   1  5400.9983
10 my_mean        2   0  5418.7674
11 my_mean        2   1  5460.8133
12 my_mean        3   3  5464.8280
13 my_mean       -1   3  5520.7021
14 my_mean        2   2  5591.7352
15 my_mean        3   2  5610.6446
16 my_mean        1   3 10244.2832
17 my_mean        1   0 10274.7504
18 my_mean        1   1 10311.6423
19 my_mean        1   2 10735.6449
20 my_mean        0   2 26904.1858
21 my_mean       -1   0 26961.0536
22 my_mean        0   0 27115.8191
23 my_mean        0   3 27538.7224
24 my_mean        0   1 28133.6159

Same with mkl 2019.02-057:

[1] "Unit = us, optimize = -1 means it is not processed by cmpfun"
       fun optimize JIT       mean
1     mean       -1   1   257.8620
2     mean       -1   0   263.3743
3     mean       -1   2   280.6906
4     mean       -1   3   291.8409
5  my_mean        2   0  5445.3252
6  my_mean        2   2  5462.4575
7  my_mean        3   3  5560.2931
8  my_mean       -1   1  5591.0089
9  my_mean        3   1  5645.3897
10 my_mean        3   0  5676.1714
11 my_mean        3   2  5707.7964
12 my_mean        2   3  5757.7887
13 my_mean       -1   3  5856.0215
14 my_mean       -1   2  5897.1735
15 my_mean        2   1  6363.1090
16 my_mean        1   2  9973.7666
17 my_mean        1   1 10557.8154
18 my_mean        1   0 10926.6103
19 my_mean        1   3 16030.0326
20 my_mean        0   0 27461.4078
21 my_mean        0   1 27939.7680
22 my_mean       -1   0 27985.4590
23 my_mean        0   3 30394.2772
24 my_mean        0   2 33768.5701

I also get that `my_mean()` and `cmp_mean()` seem to be in a virtual tie for all problem sizes. I can't replicate the graph in that link which shows that their `cmp_mean` is dramatically better. Not sure when that was written. Perhaps there has been an improvement in the R interpreter since then and that now the optimizations which `cmpfun()` does are done automatically? Alternatively, a glance at the table of contents shows something about changing which BLAS R uses. Perhaps their results are when a non-default BLAS is used? It is a good question. — John Coleman, Mar 01 '19 at 11:55
Out of curiosity, I did the experiment both with and without `enableJIT` and also with `microbenchmark` rather than `benchmark`, and in no case does `cmpfun` seem to help but instead seems to (very slightly) hurt. — John Coleman, Mar 01 '19 at 12:06
Since [3.4](https://cran.r-project.org/doc/manuals/r-release/NEWS.html) JIT is enabled by default. Maybe it has something to do with that (ie, book is outdated). You can try confirming that by testing on 3.3, 3.4, 3.5 — pogibas, Mar 01 '19 at 12:39
You can test different versions of r with rstudio.cloud pretty easily. In I just tested with R3.1, but `cmp_mean` is about 3 times faster than `my_mean`. — amatsuo_net, Mar 01 '19 at 12:47
@PoGibas It seems that `my_mean` is faster than `cmp_mean` when `JIT = 2` is the case since `3.3.3`. I did the benchmark on `rstudio.cloud`. — JiaHao Xu, Mar 03 '19 at 07:03

score 4 · Answer 1 · answered Aug 01 '21 at 22:59

While I haven't figured out why the JIT compiling isn't speeding up your code, we can speed up the same function by compiling it using the Rcpp package.

Doing so gives the following results (where mean_cpp is the function written and compiled using Rcpp:

         test elapsed relative
4 mean_cpp(x)    0.67    1.000
3     mean(x)    1.00    1.493
1  my_mean(x)   14.00   20.896
2 cmp_mean(x)   14.50   21.642

The code to generate this function is given below.

library("compiler")
library("rbenchmark")
library("Rcpp")


enableJIT(3)

my_mean = function(x) {
  total = 0
  n = length(x)
  for (each in x)
    total = total + each
  total / n
}

cmp_mean = cmpfun(my_mean, list(optimize = 3))


#we can also write this same function using the Rcpp package
cppFunction('double mean_cpp(NumericVector x) {
  double total = 0;
  int n = x.size();
  for(int i = 0; i < n; i++) {
    total += x[i];
  }
  return total / n;
}')


#run once to compile
mean_cpp(c(1))


## Generate some data
x = rnorm(100000)
benchmark(my_mean(x), cmp_mean(x), mean(x), mean_cpp(x),
          columns = c("test", "elapsed", "relative"),
          order = "relative", replications = 5000)

Compiled R code is actually slower than pure R with JIT enabled

1 Answers1