4

I'm trying to optimize evaluation of an integral (scipy.integrate.quad) over a function containing Bessel functions with Numba.

While Numba seems to work well for "common" numpy functions, it throws an error when I attempt to include the Bessel function:

Untyped global name 'jn': cannot determine Numba type of <class 'numpy.ufunc'>

From a bit of Googling, I have found a Jupyter notebook from the Numba repository that discusses making a j0 function (https://github.com/numba/numba/blob/08d5c889491213288be0d5c7d726c4c34221c35b/examples/notebooks/j0%20in%20Numba.ipynb).

The notebook comments that making the function in numba will be fast, yet the timing results they show at the end indicate ~100x slower performance with numba. Am I missing something obvious here?

And more generally, is it possible to benefit from Numba compiling for scipy Bessel functions?

SLater01
  • 459
  • 1
  • 6
  • 17
  • 1
    Not my area of expertise, but these functions are probably quite optimized fortran code, which might be a trouble-source for numba (not sure) and surely limit the optimization of the function itself. – sascha Sep 30 '17 at 11:20
  • Looks like the scipy functions are in C (https://github.com/scipy/scipy/tree/2526df72e5d4ca8bad6e2f4b3cbdfbc33e805865/scipy/special/cephes). I guess it would be great if Numba could just use the code from these files, but from much reading around online, I've yet to find a solution! – SLater01 Sep 30 '17 at 12:19

2 Answers2

3

Normally NumPy and SciPy provide very fast implementations. Numba on the other hand auto-generates code based on a Python function.

So you can't apply numba on anything except Python functions and if you want nopython mode (you want it if you're interested in speed) not even every Python function. Numba supports only a very limited set of functions and types. And these functions are all re-implemented in Numba, it doesn't use the Python or NumPy functions at all even if it looks like it would!

So you have auto-generated LLVM code vs. highly optimized custom made C/Fortran code. So you shouldn't expect to gain anything (although numba generally performs great for very small arrays compared to NumPy/SciPy functions - but for even medium sized arrays numba will be slower).

But in case you want to use some (currently unsupported) functions inside a numba jitted function you have to re-implement it yourself. Except when calling it in a long tight-loop it won't be worth the trouble, just use a normal function. Developer time is often far more important than runtime.

That doesn't mean numba isn't great. It's great for tasks that require many computations/loops and cannot be realized using existing NumPy or SciPy functions.

MSeifert
  • 145,886
  • 38
  • 333
  • 352
  • Thanks for this insight. By using Numba, I feel I could gain a significant performance boost by passing my intragrand to scipy's quad in compiled code (testing using other non-Bessel functions confirms >10x speed-up) rather than Python code. However, this will only work if Numba can compile the target function - which looks impossible for now if Bessel functions are present. Are there any alternatives here that could improve performance? – SLater01 Sep 30 '17 at 13:21
  • 2
    It depends a bit on where the bottleneck is. If the bottleneck are the bessel functions you probably can't do anything (except looking for a faster implementation - which may or may not exist). But if the bottleneck is elsewhere you could use Cython. Cython is a bit more complicated than numba and requires to compile it (however that's really easy if you use IPython or Jupyter notebooks). It could also be possible to use numbas CFFI to hook into the SciPy implementation for the bessel functions but that's a bit beyond my capabilities (and I'm not sure if it will work out). – MSeifert Sep 30 '17 at 13:25
0

This is the code you are looking for:

# Bessel function of order 1 - note that smaller arguments are added first
@nb.jit(nopython = True, nogil = True, cache = False)
def Bessel1(z):
    if z.real <= 8.:
        t = z / 8.
        fz = z * (-0.2666949632 * t**14 + 1.7629415168000002 * t**12 + -5.6392305344 * t**10 + 11.1861160576 * t**8 + -14.1749644604 * t**6 + 10.6608917307 * t**4 + -3.9997296130249995 * t**2 + 0.49999791505)
    else:
        t = 8. / z
        eta = z - 0.75 * cmath.pi
        fz = cmath.sqrt(2 / (cmath.pi * z)) * ((1.9776e-06 * t**6 + -3.48648e-05 * t**4 + 0.0018309904000000001 * t**2 + 1.00000000195) * cmath.cos(eta) - (-6.688e-07 * t**7 + 8.313600000000001e-06 * t**5 + -0.000200241 * t**3 + 0.04687499895 * t) * cmath.sin(eta))
    return fz

I can't remember the precision of this particular implementation though. Note I have done this for complex numbers, which you may or may nor require.

I pulled this out of an old computing textbook a year or so ago. It is kind of like a Taylor series expansion. I forget what the exact method is called.

Note in the code that higher order powers of "t" must be added first. This is because t is always less than 1, and rounding errors will be much greater if you add a small floating point number to a larger one.

Matt Thompson
  • 151
  • 1
  • 4