Numba does not provide a popcount
function yet. While this function can be implemented, it will definitively not be user-friendly (it required to delves into the way Numba works and deals with the LLVM-Lite JIT), not portable either (dependent of the target architecture). For more informations please read the documentation about intrinsics and this post. If you really want to take this path, then please note that while a new intrinsic function can be implemented in Numba for your needs, popcount appears not to be supported yet be the LLVM-Lite JIT wrapping layer and AFAIK the only solution is to call inline assembly directly from LLVM-Lite which is neither simple nor portable (it does not even work on all x86-64 CPUs)...
bit_length
is a method of int
objects that only make sense on variable-sized integers but Numba does not use such type (like Numpy) because of they very big overhead compared to native fixed-size numbers.
Numpy will soon add a popcount
function and this function will likely be implemented by Numba later but this is not yet the case. This is certainly the best solution for future readers.
Hopefully, there is a way to get a relatively fast code using Bit Twiddling Hacks although it will certainly not produce a code as fast as instructions like popcnt
available on most x86-64 recent processors. For example, a popcount on unsigned 32-bit integers can be implemented using the following code in Numba:
import numba as nb
# The signature is critical for the function to be correct
@nb.njit('int_(uint32)')
def popcount(v):
v = v - ((v >> 1) & 0x55555555)
v = (v & 0x33333333) + ((v >> 2) & 0x33333333)
c = np.uint32((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24
return c
# Returns 17
popcount(0b00110101_00011110_00111010_11101001)
On my machine, this is very fast, especially if the above function is put in in a loop that can be automatically vectorized since the JIT can use AVX-2 SIMD instructions: it takes about ~4 cycle/int without on my machine and ~1.5 cycle/int with AVX-2.
The same thing applies for the MSB and LSB. Note that bit twiddling hacks with integers having fewer bits should results in a faster generated code (especially due to SIMD and the need for fewer instructions).