What would be the computationally faster way to implement this 2D numerical integration?

Question

I am interested in doing a 2D numerical integration. Right now I am using the scipy.integrate.dblquad but it is very slow. Please see the code below. My need is to evaluate this integral 100s of times with completely different parameters. Hence I want to make the processing as fast and efficient as possible. The code is:

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def f(q, z, t):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)


y = np.empty([len(q)])
for n in range(len(q)):
    y[n] = integrate.dblquad(lambda t, z: f(q[n], z, t), 0, 50, lambda z: 10, lambda z: 60)[0]

end = time.time()
print(end - start)

Time taken is

212.96751403808594

This is too much. Please suggest a better way to achieve what I want to do. I tried to do some search before coming here, but didn't find any solution. I have read quadpy can do this job better and very faster but I have no idea how to implement the same. Please help.

It seems that your code currently works, and you are looking to improve it. Generally these questions are too opinionated for this site, but you might find better luck at [CodeReview.SE](//codereview.stackexchange.com/tour). Remember to read [their requirements](//codereview.stackexchange.com/help/on-topic) as they are a bit more strict than this site. — David Buck, Apr 04 '20 at 10:21
@DavidBuck Thanks a lot for the suggestion. If you feel so, I would post it on CodeReview. I posted it here because I was hoping to get suggestions along with code improvement. If others feel the same way, I will take it down. Cheers :) — Shankar_Dutt, Apr 04 '20 at 10:35
@David, are you active on CodeReview and prepared to answer this there? If not don't recommend it, especially for `numpy` questions. — hpaulj, Apr 04 '20 at 14:28
You already asked about `quadpy` in https://stackoverflow.com/questions/60905349/problems-integrating-using-quadpy. Often asking for recommendations of other packages to solve a problem is frowned upon in SO. — hpaulj, Apr 04 '20 at 16:27
You need to acknowledge and build on help that you got in the previous question. And try to apply the feedback that you got from your previous CR posting. — hpaulj, Apr 04 '20 at 20:24
For scalar calculations `math.sqrt` and `math.exp` (and `math.erf`) are faster than the `np` equivalents. — hpaulj, Apr 04 '20 at 20:30
Since the `t` and `z` boundaries are fixed, could you do just as well with two `quad` or `quad_vec`? In the previous question `quad_vec` is actually faster than `quadpy`. — hpaulj, Apr 04 '20 at 20:35
@hpaulj Thanks a lot for your replies. The last question I asked was for 1D integral. I needed help for 2D integration, that's why I asked this one. I really like your idea of using two ```quad``` or ```quad_vec```, but I have no idea how to use in my case. I would appreciate any suggestions/help regarding the same :) — Shankar_Dutt, Apr 04 '20 at 20:40
There's no dblquad in quadpy yet, but I was planning to add it these days. — Nico Schlömer, Apr 05 '20 at 18:47
@NicoSchlömer Thanks a lot for your answer. Is there any faster way to do it using ```quadpy```. As suggested above maybe using quad twice or any other way? — Shankar_Dutt, Apr 05 '20 at 19:02
Your CR post got a good answer, basically improving the run time of your equation `f()`. I suggested similar things with your previous 1d question. We improve these `quad` times by either by making `f()` faster, or using a smarter `quad` (which may call `f` fewer times). `quad_vec` improves speed by asking `f` for multiple values at once. — hpaulj, Apr 06 '20 at 16:08

max9111 · Accepted Answer · 2020-05-04T18:02:50.627

You could use Numba or a low-level-callable

Almost your example

I simply pass function directly to scipy.integrate.dblquad instead of your method using lambdas to generate functions.

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def f(t, z, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)

def lower_inner(z):
    return 10.

def upper_inner(z):
    return 60.


y = np.empty(len(q))
for n in range(len(q)):
    y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0]

end = time.time()
print(end - start)
#143.73969149589539

This is already a tiny bit faster (143 vs. 151s) but the only use is to have a simple example to optimize.

Simply compiling the functions using Numba

To get this to run you need additionally Numba and numba-scipy. The purpose of numba-scipy is to provide wrapped functions from scipy.special.

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time
import numba as nb

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

#error_model="numpy" -> Don't check for division by zero
@nb.njit(error_model="numpy",fastmath=True)
def f(t, z, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)

def lower_inner(z):
    return 10.

def upper_inner(z):
    return 60.


y = np.empty(len(q))
for n in range(len(q)):
    y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0]

end = time.time()
print(end - start)
#8.636585235595703

Using a low level callable

The scipy.integrate functions also provide the possibility to pass C-callback function instead of a Python function. These functions can be written for example in C, Cython or Numba, which I use in this example. The main advantage is, that no Python interpreter interaction is necessary on function call.

An excellent answer of @Jacques Gaudin shows an easy way to do this including additional arguments.

import numpy as np
from scipy import integrate
from scipy.special import erf
from scipy.special import j0
import time
import numba as nb
from numba import cfunc
from numba.types import intc, CPointer, float64
from scipy import LowLevelCallable

q = np.linspace(0.03, 1.0, 1000)

start = time.time()

def jit_integrand_function(integrand_function):
    jitted_function = nb.njit(integrand_function, nopython=True)

    #error_model="numpy" -> Don't check for division by zero
    @cfunc(float64(intc, CPointer(float64)),error_model="numpy",fastmath=True)
    def wrapped(n, xx):
        ar = nb.carray(xx, n)
        return jitted_function(ar[0], ar[1], ar[2])
    return LowLevelCallable(wrapped.ctypes)

@jit_integrand_function
def f(t, z, q):
    return t * 0.5 * (erf((t - z) / 3) - 1) * j0(q * t) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
        -0.5 * ((z - 40) / 2) ** 2)

def lower_inner(z):
    return 10.

def upper_inner(z):
    return 60.


y = np.empty(len(q))
for n in range(len(q)):
    y[n] = integrate.dblquad(f, 0, 50, lower_inner, upper_inner,args=(q[n],))[0]

end = time.time()
print(end - start)
#3.2645838260650635

Vegard Gjeldvik Jervell · Answer 2 · 2020-04-06T13:08:23.577

Generally it is much, much faster to do a summation via matrix operations than to use scipy.integrate.quad (or dblquad). You could rewrite your f(q, z, t) to take in a q, z and t vector and return a 3D-array of f-values using np.tensordot, then multiply your area element (dtdz) with the function values and sum them using np.sum. If your area element is not constant, you have to make an array of area-elements and use np.einsum To take your integration limits into account you can use a masked array to mask the function values outside your integration limits before summarizing. Take note that np.einsum overlooks the masks, so if you use einsum you can use np.where to set function values outside your integration limits to zero. Example (with constant area element and simple integration limits):

import numpy as np
import scipy.special as ss
import time

def f(q, t, z):

    # Making 3D arrays before computation for readability. You can save some time by
    # Using tensordot directly when computing the output
    Mq = np.tensordot(q, np.ones((len(t), len(z))), axes=0)
    Mt = np.tensordot(np.ones(len(q)), np.tensordot(t, np.ones(len(z)), axes = 0), axes = 0)
    Mz = np.tensordot(np.ones((len(q), len(t))), z, axes = 0)

    return Mt * 0.5 * (ss.erf((Mt - Mz) / 3) - 1) * (Mq * Mt) * (1 / (np.sqrt(2 * np.pi) * 2)) * np.exp(
     -0.5 * ((Mz - 40) / 2) ** 2)

q = np.linspace(0.03, 1, 1000)
t = np.linspace(0, 50, 250)
z = np.linspace(10, 60, 250)

#if you have constand dA you can shave some time by computing dA without using np.diff
#if dA is variable, you have to make an array of dA values and np.einsum instead of np.sum
t0 = time.process_time()
dA = np.diff(t)[0] * np.diff(z)[0]
func_vals = f(q, t, z)
I = np.sum(func_vals * dA, axis=(1, 2))
t1 = time.process_time()

this took 18.5s on my 2012 macbook pro (2.5GHz i5) with dA = 0.04. Doing things this way also allows you to easily choose between precision and efficiency, and to set dA to a value that makes sense when you know how your function behaves.

However, it is worth noting that if you want a larger amount of points, you have to split up your integral, or else you risk maxing out your memory (1000 x 1000 x 1000) doubles requires 8GB of ram. So if you are doing very big integrations with high presicion it can be worth doing a quick check on the memory required before running.

What would be the computationally faster way to implement this 2D numerical integration?

2 Answers2

You could use Numba or a low-level-callable

Linked