Comparing numba njit/vectorize/guvectorize

Question

I have been testing the following block for numba speed up:

import numpy as np
import timeit
from numba import njit
import numba 

@numba.guvectorize(["void(float64[:],float64[:],float64[:],float64, float64, float64[:])"],
             "(m),(m),(m),(),()->(m)",nopython=True,target="parallel")
def func_diff_calc_numba_v2(X,refY,Y,lower,upper,arr):
    fac=1000
    for i in range(len(X)):
        if X[i] >=lower and X[i] <upper:
           diff=Y[i]-refY[i]
           arr[i] = diff**2*fac
        else:
            arr[i] = 0

@numba.vectorize('(float64, float64, float64, float64, float64)',nopython=True,target="parallel")
def func_diff_calc_numba_v3(X,refY,Y,lower,upper):
    fac=1000
    if X >= lower and X < upper:
       return (Y-refY)**2*fac
    else:
       return 0.0
@njit
def func_diff_calc_numba(X,refY,Y,lower,upper):
    fac=1000
    arr=np.zeros(len(X))
    for i in range(len(X)):
        if X[i] >=lower and X[i] <upper:
           arr[i]=(Y[i]-refY[i])**2*fac
        else:
            arr[i] = 0
    return arr

np.random.seed(69)
X=np.arange(10000)
refY = np.random.rand(10000)
Y = np.random.rand(10000)

lower=1
upper=10000

print("func_diff_calc_numba: {:.5f}".format(timeit.timeit(stmt="func_diff_calc_numba(X,refY,Y,lower,upper)", number=10000, globals=globals())))
print("func_diff_calc_numba_v2: {:.5f}".format(timeit.timeit(stmt="func_diff_calc_numba_v2(X,refY,Y,lower,upper)", number=10000, globals=globals())))
print("func_diff_calc_numba_v3: {:.5f}".format(timeit.timeit(stmt="func_diff_calc_numba_v3(X,refY,Y,lower,upper)", number=10000, globals=globals())))

The speedups for the v2 and v3 are significantly different:

func_diff_calc_numba: 0.58257
func_diff_calc_numba_v2: 0.49573
func_diff_calc_numba_v3: 1.07519

and if I change the number of iterations from 10,000 to 100,000 then:

func_diff_calc_numba: 1.67251
func_diff_calc_numba_v2: 4.85828
func_diff_calc_numba_v3: 11.63361

I was expecting vectorize and guvectorize to be almost similar in speedup but while njit and guvectorize are almost equal to each other in time, vectorize is ~2 and ~10 times slower than guvectorize and njit respectively. Is there is something wrong in my implementation or something else?

Maybe you make timings with compilation time? – dankal444 Dec 05 '22 at 12:35 — dankal444, Dec 05 '22 at 12:35

Rutger Kassies · Answer 1 · 2022-12-05T16:43:38.933

The task (function + inputs) is probably too small/simple to be effectively parallelized, causing the overhead of doing so to increase total runtime. If you compile both to the default cpu target the difference disappears I assume?

Because your input is 1D, with the given ufunc signature, the guvectorize doesn't parallelize anything, because there's only one task.

A like-for-like parallel comparison can be done by setting the signature to "(),(),(),(),()->()" basically telling it to also (like vectorize) apply the function element-wise. And those results should be very close again. But then you'll see that the overhead of parallelization makes it worse for both in this case.

For me timings are:

Using target="parallel" for both, and "(m),(m),(m),(),()->(m)":
```
numba_guvec    : 0.26364
numba_vec      : 3.26960
```
Using target="cpu" for both, and "(m),(m),(m),(),()->(m)":
```
numba_guvec    : 0.21886
numba_vec      : 0.26198
```
Using target="parallel" for both, and "(),(),(),(),()->()":
```
numba_guvec    : 3.05748
numba_vec      : 3.15587
```

You'll probably find similar behavior if you would also compare @njit(parallel=True) with the numba.prange.

At the end, there's just some extra work involved for parallelizing something, and that's only worth it for a sufficiently large (slow) task.

Comparing numba njit/vectorize/guvectorize

1 Answers1