5

I have a function of the form which I want to integrate:

def f(z, t, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * np.exp(-0.5 * ((z - 40) / 2) ** 2)

I have taken this function as an example to understand and show the difference between different integration methods. Based on earlier knowledge available through various answers on this platform as well as in documentation, I have tried different methods to increase the integration speed of the function. I will explain and compare them in brief here:

1. Using just python and scipy.quad:(takes 202.76 s)

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def f(q, z, t):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t)  * np.exp(-0.5 * ((z - 40) / 2) ** 2)


y = np.empty([len(q)])
for n in range(len(q)):
    y[n] = integrate.dblquad(lambda t, z: f(q[n], z, t), 0, 50, lambda z: 10, lambda z: 60)[0]

end = time.time()
print(end - start)

#202.76 s

As expected this is not very fast.

2. Using a low level callable and compiling the functions using Numba:(takes 6.46 s)

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time
import numba as nb
from numba import cfunc
from numba.types import intc, CPointer, float64
from scipy import LowLevelCallable

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def jit_integrand_function(integrand_function):
    jitted_function = nb.njit(integrand_function, nopython=True)

    #error_model="numpy" -> Don't check for division by zero
    @cfunc(float64(intc, CPointer(float64)),error_model="numpy",fastmath=True)
    def wrapped(n, xx):
        ar = nb.carray(xx, n)
        return jitted_function(ar[0], ar[1], ar[2])
    return LowLevelCallable(wrapped.ctypes)

@jit_integrand_function
def f(t, z, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)

def lower_inner(z):
    return 10.

def upper_inner(z):
    return 60.


y = np.empty(len(q))
for n in range(len(q)):
    y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0]

end = time.time()
print(end - start)

#6.46 s

This is faster than earlier approach.

3. Using scipy.quad_vec :(takes 2.87 s)

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time
import numba as nb

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def f(z, t, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * np.exp(-0.5 * ((z - 40) / 2) ** 2)

ans2 = integrate.quad_vec(lambda t: integrate.quad_vec(lambda z: f(z,t,q),10, 60)[0],0,50)[0]

end = time.time()
print(end - start)
#2.87s

The method involving scipy.quad_vec is the fastest of all. I was wondering how to make it even faster? scipy.quad_vecunfortunately doesn't accept low level callable functions; is there anyway to vectorize the scipy.quad or scipy.dblquad functions as efficiently as scipy.quad_vec? Or if there is any other approach that I can use? Any kind of help will be very much appreciated.

Shankar_Dutt
  • 115
  • 9
  • 1
    I don't see that much difference between method 2 and 3. Of course you can either implement everything (including quad) in C/Cython/Numba. eg. https://github.com/Nicholaswogan/NumbaQuadpack Than the outer loop is parallelize-able. Although this can be quite some work to do.... Also think about mixing compilation time and runtime. – max9111 Oct 07 '21 at 13:24

1 Answers1

1

Here is a 10x faster version

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def fa(z, t, q):
    return (erf((t - z) / 3) - 1) * np.exp(-0.5 * ((z - 40) / 2) ** 2)

ans3 = 0.5 * integrate.quad_vec(lambda t: t * j0(q * t) * 
         integrate.quad_vec(lambda z: fa(z,t,q),10, 60)[0],0,50)[0]

end = time.time()
print(end - start)

What makes the difference here is to take j0(q*t) out of the inner integral.

Bob
  • 13,867
  • 1
  • 5
  • 27
  • Thanks a lot, Bob for your reply. You are absolutely right that by taking bessel function out of the inner integral, we achieve higher speeds but I have taken this function as an example to show the difference between different integration methods. I am more interested in if we can do anything to further increase the speed of quad_vec? or vectorized quad in general with greater efficiency. For instance, we can take your solution and the question is if we can make it even faster by any means? – Shankar_Dutt Oct 07 '21 at 11:09
  • 1
    I think not for the general case, you could try to use numba to integrate the compute the inner integral and quad_vec for the outer integral. The problem is that adaptive methods will use the evaluation of the function to decide the arguments for next calls (refinements), you would have to make use of a non-adaptive method, and that would be way slower. – Bob Oct 07 '21 at 11:38