MSB() LSB() popcount() in numba

Question

I have a program that uses numba and needs to use MSB() LSB() POPCOUNT() etc I can't seem to find any reference to these in relation to numba, how does one get these builtin function like python's bit_length() working with numba?

Unknown attribute 'bit_length' of type int64

File "v.py", line 116:
def msb(x):
    return int(int(x).bit_length()) - 1
    ^

@CryptoFool Certainly not since the main answer is not efficient in Numba: it use a cast to `float64` which is quite expensive and a `log2` function which is very expensive. The cast back to int is a bit expensive too. Overall, it should easily take at least dozens of cycles on a recent processor while a `popcount` have a throughput of 1 cycle on Intel processors and ~0.5 on AMD Zen ones (even ~0.25 on newer ones). On my Intel CoffeeLake machine, the log2 takes 40 cycles and is not accurate for big numbers. It is 40 times slower than a popcount. On AMD it should be about 100 times slower. — Jérôme Richard, Feb 13 '22 at 10:53

score 1 · Answer 1 · answered Feb 13 '22 at 12:09

Numba does not provide a popcount function yet. While this function can be implemented, it will definitively not be user-friendly (it required to delves into the way Numba works and deals with the LLVM-Lite JIT), not portable either (dependent of the target architecture). For more informations please read the documentation about intrinsics and this post. If you really want to take this path, then please note that while a new intrinsic function can be implemented in Numba for your needs, popcount appears not to be supported yet be the LLVM-Lite JIT wrapping layer and AFAIK the only solution is to call inline assembly directly from LLVM-Lite which is neither simple nor portable (it does not even work on all x86-64 CPUs)...

bit_length is a method of int objects that only make sense on variable-sized integers but Numba does not use such type (like Numpy) because of they very big overhead compared to native fixed-size numbers.

Numpy will soon add a popcount function and this function will likely be implemented by Numba later but this is not yet the case. This is certainly the best solution for future readers.

Hopefully, there is a way to get a relatively fast code using Bit Twiddling Hacks although it will certainly not produce a code as fast as instructions like popcnt available on most x86-64 recent processors. For example, a popcount on unsigned 32-bit integers can be implemented using the following code in Numba:

import numba as nb

# The signature is critical for the function to be correct
@nb.njit('int_(uint32)')
def popcount(v):
    v = v - ((v >> 1) & 0x55555555)
    v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
    c = np.uint32((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24
    return c

# Returns 17
popcount(0b00110101_00011110_00111010_11101001)

On my machine, this is very fast, especially if the above function is put in in a loop that can be automatically vectorized since the JIT can use AVX-2 SIMD instructions: it takes about ~4 cycle/int without on my machine and ~1.5 cycle/int with AVX-2.

The same thing applies for the MSB and LSB. Note that bit twiddling hacks with integers having fewer bits should results in a faster generated code (especially due to SIMD and the need for fewer instructions).

thanks. it does in fact run faster than the native version – Vic C Feb 17 '22 at 05:50 — Vic C, Feb 17 '22 at 05:50

MSB() LSB() popcount() in numba

1 Answers1

Linked