How to pre warm-up Numba's JIT?

Question

I have been struggling with Numba in the sense that whenever I write a function for it, it has a very long warm-up for the first time I use it. I wanted to ask is there a way to prewarm up the JIT function?

For example if I write this function y=1/(log(x+0.1))^2 as a Numba function:

@jit(parallel=True,error_model='numpy')
def f_numba(x_vec): 
    N=len(x_vec)
    res=np.empty(N)
    for i in prange(N):
        x=x_vec[i]
        x=np.log(x+0.1)
        res[i]=1/(x*x)
    return res

I used this array to test the speed of the function:

N=150000
x_vect=np.random.rand(N)

And for measuring the execution time of the function I used this:

for i in range(5):
    start=timer()
    f_numba(x_vect)
    print('#',timer()-start)

It takes 0.8 seconds for the first run and 0.001 seconds for all subsequent runs. If I could somehow pre-warm-up the JIT function to avoid this latency, it would be great. I tried using the first run with a dummy array with a small size using x_warm=np.random.rand(10) and then running f_numba(x_warm) but the warm-up time didn't change at all. Any suggestions?

For completionism, here are the libraries called:

import numpy as np
from timeit import default_timer as timer
from numba import jit, prange

I'm using Jupyter Notebook with Python 3.7.

`jit` stands for "just in time" - it's intended to compile at runtime. Python is an interpreted language, not (normally) compiled, so pre-compiling isn't really a thing for most cases. It's possible (that's why `numpy` is so fast, as much of it's pre-compiled c-code in the backend) but it's not generally a simple task. — Daniel F, Nov 03 '20 at 09:22
You could try `cache=True` to cache the compilation result. Initializing the caching algorithm can also take some time, but it has fur sure a significant effect if you have more than one function. — max9111, Nov 03 '20 at 09:28
@DanielF but why is JIT faster than numpy for large arrays if numpy is precompiled? — dani, Nov 03 '20 at 09:35
Because `numpy` also has a lot of guard-rails built in to make sure that the package is 'pythonic' and flexible, as well as having some front-end `python` code. `numba` is super fast, but much less forgiving and dynamic. And for very large arrays and vectorized `numpy` code (which isn't always possible), `jit` isn't really much faster. — Daniel F, Nov 03 '20 at 09:44
@max9111 the cache=True has no effect until I remove parallel=True, so I can't use multithreading when I create a cache file? — dani, Nov 03 '20 at 09:46
It was implemented a few releases ago, maybe you are on a very old numba version? Another alternative to numba is numexpr. — max9111, Nov 03 '20 at 09:47
@max9111 my numba is new but my CPU is old (core i7 3rd gen), so maybe that's the reason? My numpy is also 1.16.0 because of tensorflow. I also tried numexpr but it isn't as fast as numba. — dani, Nov 03 '20 at 10:01
@max9111 I tried it on a newer CPU (AMD 3500u) with Numpy 1.19.3 and Numba 0.51.2. With both parallel=True, cache=True, the result is the same as parallel=True (as if cache=True is ignored). With only cache=True and without parallel=True I get the decreased warm-up (but at the expense of slower next iterations). — dani, Nov 03 '20 at 10:25

How to pre warm-up Numba's JIT?

0 Answers0