Why is numba not speeding up the following piece of code?

Question

@jit(nopython=True)
def sort(x):
    for i in range(1000):
        np.sort(x)

I thought numba was made for these sorts of tasks, where you have for loops combined with numpy operations. Yet this jitted function is 2-3x slower than the pure Python variant (i.e. the same function but without the jit), and yes I have run it after it was compiled.

Am I doing something wrong?

EDIT:

Size of x and data-type is dtype = int32 AND float64 (I tried both), len = 5000.

How big is `x`? What is its data type? Please answer by editing the question directly. — Jérôme Richard, Aug 14 '21 at 21:39

score 2 · Accepted Answer · answered Aug 14 '21 at 22:35

The performance of the Numba implementation is not mean to be faster with relatively big array (eg. > 1024). Indeed, both Numba and Numpy use a compiled sorting algorithm as Numba does (except Numba use a JIT). Numba an only be better here for small arrays because it can mostly remove the overhead of calling a Numpy function from the CPython interpreter (and performing many input checks). The running time is dominated by the time of the sorting calls and not the overhead of the loop for an array of size=5000 (see below).

Besides this, both implementation appear to use slightly different algorithm implementations (at least not the same thresholds). As a result, the two implementations results in different performance. This is dependent of the input array. Some sorting algorithm are fast on some specific kind of distribution where some other sorting algorithm are slow and vice versa for other kind of distribution.

Here is the runtime execution of the two implementation plotted against the array size tested on random arrays on my machine (with 32-bit integers from 0 to 1,000,000,000):

One can see that Numba is faster for small arrays and faster for big ones. When len=5000, the Numba implementation is 50% slower.

Note that you can tune the algorithm used using the parameter kind. Note also that some Numpy optimized implementations use parallelism so that primitives can run faster. In that case, the comparison with the Numba implementation is not fair as Numba should use a sequential implementation (especially if parallel=True is not set). Besides this, this problem appear to be a well known issue and developers are working on it.

the statement that sorting dominates over the for loop made it click for me, ty that makes sense — michael, Aug 14 '21 at 22:42

score 0 · Answer 2 · answered Aug 14 '21 at 22:10

0

I wouldn't expect any performance benefit either. Numba isn't a magic wand that if you just add it you magically get better performance. It does have an overhead that can easily sneak up on you. It helps to understand what exactly numba does. It parses the ast of a python function and compiles it to native code using llvm and for a lot of non-trivial cases, this makes a huge difference because honestly, python sucks at complex math and branching. That is a reasonable drawback for its design choices. Take a look at your code though. It is a numpy sort function inside a for loop. Think logically what optimisation could numba possibly make that could speed this up. Remember that numpy is already damn fast and numba cant really affect that performance. So you have essentially added overhead to the most critical part of your code and hence the loss in performance.

answered Aug 14 '21 at 22:10

Niteya Shah

1,809
1
17
30

what i dont get is why numba cant optimize the for loop? arent for loops slow in pure python? i get that numba-sort cant do better than numpy-sort but im doing it 1000x times in a loop. why cant numba make that loop faster? and if numba cant do that, whats the point? it seems useless then, i can just use numpy then....then nobody should ever use numba, just learn to use numpy.... – michael Aug 14 '21 at 22:17
thinking more about it, you are defnitely wrong ... numba SHOULD be able to make the loop faster, otherwise it has no basis to exist as one can always use numpy in a for loop if thats faster – michael Aug 14 '21 at 22:22
One hard rule for most compilers is to assume that most coders know what they are doing when they write code. So when you call it a thousand times, numba will not change that. It is how numba internally works. my guess is that you think that numba can just cache the result once and then use it a thousand times. that would a wise optimisation but one that isnt made because numba cant know for sure that x is changed in between each of its calls. Also remeber that python math calculation is slow, not running a for loop. A for loop in python is just as fast as in C. – Niteya Shah Aug 14 '21 at 22:27
1

Numba is a library designed to speed up the work of developers. It never claimed that it will outperform numpy. Also remember that numpy can't be used for all cases. Take a look at this answer https://stackoverflow.com/questions/65921720/efficient-way-to-get-all-numpy-slices-for-different-ranges/65922255#65922255 Just see how much work the other answer had to do to get a decent result and yet was beaten trivially by numba. You should of course use numpy whenever possible but there will be cases where it is impossible or very hard to represent it in numpy code – Niteya Shah Aug 14 '21 at 22:30
Niteya shah, you keep arguing against your self. First you say numba cant beat numpy, then you literally link you a question where the numba-version beats the numpy version. – michael Aug 15 '21 at 07:54

Why is numba not speeding up the following piece of code?

2 Answers2