numba slower for numpy.bitwise_and on boolean arrays

Question

I am trying numba in this code snippet

from numba import jit
import numpy as np
from time import time
db  = np.array(np.random.randint(2, size=(400e3, 4)), dtype=bool)
out = np.zeros((int(400e3), 1))

@jit()
def check_mask(db, out, mask=[1, 0, 1]):
    for idx, line in enumerate(db):
        target, vector = line[0], line[1:]
        if (mask == np.bitwise_and(mask, vector)).all():
            if target == 1:
                out[idx] = 1
    return out

st = time()
res = check_mask(db, out, [1, 0, 1])
print 'with jit: {:.4} sec'.format(time() - st)

With numba @jit() decorator this code run slower !

without jit: 3.16 sec
with jit: 3.81 sec

just to help understand better the purpose of this code:

db = np.array([           # out value for mask = [1, 0, 1]
    # target,  vector     #
      [1,      1, 0, 1],  # 1
      [0,      1, 1, 1],  # 0 (fit to mask but target == 0)
      [0,      0, 1, 0],  # 0
      [1,      1, 0, 1],  # 1
      [0,      1, 1, 0],  # 0
      [1,      0, 0, 0],  # 0
      ])

Look at the `array_equal` code, as shown in my recent answer, http://stackoverflow.com/a/34486522/901925. — hpaulj, Dec 28 '15 at 21:23
Thanks @hpaulj I've just updated the snippet to take into account your comment — user3313834, Dec 28 '15 at 21:46

score 5 · Accepted Answer · answered Dec 29 '15 at 20:22

Numba has two compilation modes for jit: nopython mode and object mode. Nopython mode (the default) supports only a limited set of Python and Numpy features, refer to the docs for your version. If the jitted function contains unsupported code, Numba has to fall back to object mode, which is much, much slower.

I'm not sure if objcet mode is supposed to give a speedup compared to pure Python, but you'll always want to use nopython mode anyway. To make sure nopython mode is used, specify nopython=True and stick to very basic code (rule of thumb: write out all the loops and only use scalars and Numpy arrays):

@jit(nopython=True)
def check_mask_2(db, out, mask=np.array([1, 0, 1])):
    for idx in range(db.shape[0]):
        if db[idx,0] != 1:
            continue
        check = 1
        for j in range(db.shape[1]):
            if mask[j] and not db[idx,j+1]:
                check = 0
                break
        out[idx] = check
    return out

Writing out the inner loop explicitly also has the advantage that we can break out of it as soon as the condition fails.

Timings:

%time _ = check_mask(db, out, np.array([1, 0, 1]))
# Wall time: 1.91 s
%time _ = check_mask_2(db, out, np.array([1, 0, 1]))
# Wall time: 310 ms  # slow because of compilation
%time _ = check_mask_2(db, out, np.array([1, 0, 1]))
# Wall time: 3 ms

BTW, the function is also easily vectorized with Numpy, which gives a decent speed:

def check_mask_vectorized(db, mask=[1, 0, 1]):
    check = (db[:,1:] == mask).all(axis=1)
    out = (db[:,0] == 1) & check
    return out

%time _ = check_mask_vectorized(db, [1, 0, 1])
# Wall time: 14 ms

Thanks, using your advices on a more complicate but similar problem I did not succeed to speed up with numba: http://stackoverflow.com/q/34544210/3313834 — user3313834, Dec 31 '15 at 09:34

score 5 · Answer 2 · edited Apr 30 '18 at 02:36

Alternatively, you can try Pythran (disclaimer: I am a developer of Pythran).

With a single annotation, it compiles the following code

#pythran export check_mask(bool[][], bool[])

import numpy as np
def check_mask(db, out, mask=[1, 0, 1]):
    for idx, line in enumerate(db):
        target, vector = line[0], line[1:]
        if (mask == np.bitwise_and(mask, vector)).all():
            if target == 1:
                out[idx] = 1
    return out

with a call to pythran check_call.py.

And according to timeit, the resulting native module runs pretty fast:

python -m timeit -s 'n=1e4; import numpy as np; db  = np.array(np.random.randint(2, size=(n, 4)), dtype=bool); out = np.zeros(int(n), dtype=bool); from eq import check_mask' 'check_mask(db, out)'

tells me the CPython version runs in 136ms while the Pythran-compiled version runs in 450us.

I reach to this question http://stackoverflow.com/q/35240168/3313834 doing it with pythran. BTW pythran deserve a tag on stackoverflow — user3313834, Feb 06 '16 at 10:58

score 1 · Answer 3 · answered Dec 29 '15 at 18:50

1

I would recommend removing the numpy call to array_equal from the inner loop. numba isn't necessarily smart enough to turn this into a piece of inlined C; and should it fail to replace this call, the dominant cost of your function remains comparable, which would explain your result.

While numba can reason about a fair number of numpy constructs, it is only C-style code acting on numpy arrays which one may rely on being accelerated.

answered Dec 29 '15 at 18:50

Eelco Hoogendoorn

10,459
1
44
42

yes, that's why I've removed the array_equal, replaced by np.bitwise_and(mask, vector) – user3313834 Dec 29 '15 at 18:53
I am not sure this is a significant difference to numba. You would probably need to manually perform the loop over all components of the mask in order to avoid python calls in the inner loop – Eelco Hoogendoorn Dec 29 '15 at 20:03

numba slower for numpy.bitwise_and on boolean arrays

3 Answers3