Possible shortcomings for using JIT with R?

Question

I recently discovered that one can use JIT (just in time) compilation with R using the compiler package (I summarizes my findings on this topic in a recent blog post).

One of the questions I was asked is:

Is there any pitfall? it sounds too good to be true, just put one line of code and that's it.

After looking around I could find one possible issue having to do with the "start up" time for the JIT. But is there any other issue to be careful about when using JIT?

I guess that there will be some limitation having to do with R's environments architecture, but I can not think of a simple illustration of the problem off the top of my head, any suggestions or red flags will be of great help?

I'm not sure about performance hits (other than initial compilations (and perhaps increased memory usage)) but the "Note: no visible binding" messages can often be overwhelming to a newbie (e.g., if using ggplot2) and can throw off tab-complete (at least, they are for me) — mweylandt, Apr 11 '12 at 13:59
Hi mweylandt. Do you happen to know what that error massage means? — Tal Galili, Apr 11 '12 at 14:48
I have been putting `ByteCompile: true` in the DESCRIPTION file of my packages as I create new versions and it seems to work ok. I did one small test `http://www.johnmyleswhite.com/notebook/2012/03/31/julia-i-love-you/comment-page-1/#comment-19522` and the byte compiled version, `fib2c` ran 4x faster than the ordinary one, `fib2a`. In some cases R is already fast even without byte compiling (e.g. highly vectorized code using C underneath) and in those cases there obviously is little opportunity for speedup -- its mainly useful for slow R code. — G. Grothendieck, Apr 11 '12 at 14:52
An alternative is to just compile the functions when loading in your functions, e.g. your private library etc. I noted something similar with a defferent purpose here: http://stackoverflow.com/questions/9815378/search-all-existing-functions-for-package-dependencies/9823018#9823018 — Hansi, Apr 11 '12 at 15:00
Hi Dirk, I am aware of it (even mentioned it on the post). Would that mean that there is no downside in any case for using JIT?! — Tal Galili, Apr 11 '12 at 18:52
@TalGalili, I'm pretty sure it has to do with variables that aren't in the local scope of whatever closure the byte-compiler is currently looking at, but I'm not really sure how to verify that. There was a somewhat heated thread on R-Devel recently sparked (in part) by this message in the context of package checking. See https://stat.ethz.ch/pipermail/r-devel/2012-March/063678.html and many following — mweylandt, Apr 11 '12 at 21:01

score 11 · Answer 1 · answered Apr 12 '12 at 19:16

11

the output of a simple test with rpart could be an advice not to use enableJIT in ALL cases:

library(rpart)
fo <- function() for(i in 1:500){rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)}
system.time(fo())
#User      System verstrichen 
#2.11        0.00        2.11 

require(compiler)
enableJIT(3)
system.time(fo())
#User      System verstrichen 
#35.46        0.00       35.60

Any explanantion?

answered Apr 12 '12 at 19:16

Robert

119
1
2

1

That is weird, so something about bye compiling the loop in fo is causing an issue. If you compile it normally it will not happen. http://ideone.com/Nu8IZ , note rpart is already byte compiled. – Hansi Apr 13 '12 at 09:31
7

Compiling takes a good half minute: I see the same (2.8 s - 42.6 s), but then doing system.time (fo ()) again takes only 2.6 s. – cbeleites unhappy with SX Apr 13 '12 at 13:05
`rpart`, I believe, calls C/Fortran. Is this a good test of R's JIT capabilities? Wouldn't it be better to create a function that solely relies on R code? – Jon May 11 '17 at 22:03

score 4 · Accepted Answer · answered May 21 '16 at 18:34

The rpart example given above, no longer seems to be an issue:

library("rpart")
fo = function() {
  for(i in 1:500){
    rpart(Kyphosis ~ Age + Number + Start, data=kyphosis)
  }
}    system.time(fo())
#   user  system elapsed 
#  1.212   0.000   1.206 
compiler::enableJIT(3)
# [1] 3
system.time(fo())
#   user  system elapsed 
#  1.212   0.000   1.210

I've also tried a number of other examples, such as

growing a vector;
A function that's just a wrapper around mean

While I don't always get a speed-up, I've never experience a significant slow-down.

R> sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04 LTS

score 2 · Answer 3 · answered Mar 06 '18 at 14:20

In principle, once the byte-code is compiled and loaded, it should always be interpreted at least as fast as the original AST interpreter. Some code will benefit from big speedups, this is usually code with a lot of scalar operations and loops where most time is spent in R interpretation (I've seen examples with 10x speedup but arbitrary micro-benchmarks could indeed inflate this as needed). Some code will run at the same speed, this is usually code well vectorized and hence spending nearly no time in interpretation. Now, compilation itself can be slow. Hence, the just in time compiler now does not compile functions when it guesses it won't pay off (and the heuristics change over time, this is already in 3.4.x). The heuristics don't always guess it right, so there may be situations when compilation won't pay off. Typical problematic patterns are code generation, code modification and manipulation of bindings of environments captured in closures.

Packages can be byte-compiled at installation time so that the compilation cost is not paid (repeatedly) at run time, at least for code that is known ahead of time. This is now the default in development version of R. While the loading of compiled code is much faster than compiling it, in some situations one may be loading even code that won't be executed, so there actually may be an overhead, but overall pre-compilation is beneficial. Recently some parameters of the GC have been tuned to reduce the cost of loading code that won't be executed.

My recommendation for package writers would be to use the defaults (just-in-time compilation is now on by default in released versions, byte-compilation at package installation time is now on in the development version). If you find an example where the byte-code compiler does not perform well, please submit a bug report (I've also seen a case involving rpart in earlier versions). I would recommend against code generation and code manipulation and particularly so in hot loops. This includes defining closures, deleting and inserting bindings in environments captured by closures. Definitely one should not do eval(parse(text= in hot loops (and this had been bad already without byte-compilation). It is always better to use branches than to generate new closures (without branches) dynamically. Also it is better to write code with loops than to dynamically generate code with huge expressions (without loops). Now with the byte-code compiler, it is now often ok to write loops operating on scalars in R (the performance won't be as bad as before, so one could more often get away without switching to C for the performance critical parts).

score -2 · Answer 4 · answered Oct 04 '13 at 19:18

-2

Further to the previous answer, experimentation shows the problem is not with the compilation of the loop, it is with the compilation of closures. [enableJIT(0) or enableJIT(1) leave the code fast, enableJIT(2) slows it down dramatically, and enableJIT(3) is slightly faster than the previous option (but still very slow)]. Also contrary to Hansi's comment, cmpfun slows execution to a similar extent.

answered Oct 04 '13 at 19:18

Elroch

132
1
7

It seems clear that the reason for different results was different versions of R. Using R 3.3.2 today, enableJIT(i) gives exactly the same timing (1.21 seconds) regardless of whether i is 0, 1, 2, or 3. I'll leave the interpretation to others, but the message to me is it is optimised without any help from the user now. – Elroch Jan 28 '17 at 00:09

Possible shortcomings for using JIT with R?

4 Answers4

Linked