If you just want to count the 1-bits in an uint and use gcc, you should have a look at the builtin functions (here: int __builtin_popcount (unsigned int x)
). These can be expected to be highly optimized and even use special instructions of the CPU where available. (one could very wenn test for gcc).
However, not sure what get_num()
would yield - it just seems not to depend on mask
, so its output can be used to limit the result of popcount.
The following uses a loop and might be faster than a parallel-add tree on some architectures (one should profile both versions if timing is essential).
unsigned popcount(uint32_t value, unsigned width)
{
unsigned cnt = 0; // actual size intentionally by arch
if ( width < 32 )
value &= (1UL << width) - 1; // limit to actual width
for ( ; value ; value >>= 1 ) {
cnt += value & 1U; // avoids a branch
}
return cnt;
}
Note the width is passed to the function.
On architectures with < 32 bits (PIC, AVR, MSP430, etc.) specialized versions will gain much better results than a single version.