1

The popcount function returns the number of 1's in an input. 0010 1101 has a popcount of 4.

Currently, I am using this algorithm to get the popcount:

private int PopCount(int x)
{
    x = x - ((x >> 1) & 0x55555555);
    x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
    return (((x + (x >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

This works fine and the only reason I ask for more is because this operation is run awfully often and I am looking for additional performance gains.

I'm looking for a way to simplify the algorithm based on the fact that my 1's will always be right aligned. That is, the input will be something like 00000 11111 (returns 5) or 00000 11111 11111 (returns 10).

Is there a way to make a more efficient popcount based on this constraint? If the input was 01011 11101 10011, it would just return 2 because it only cares about the right-most ones. It seems any kind of looping is slower than the existing solution.

Ryan Peschel
  • 11,087
  • 19
  • 74
  • 136
  • In other words, you're asking whether a "count leading zeros" (clz) operation is faster than "count all ones" (popcnt)? – Ben Voigt Mar 28 '15 at 21:07
  • You could calculate it as `trailing_zeroes(~x)`, that would be nice in x86 assembly but in C# that's still annoying. You could try that with the "double hack", but I don't expect much.. – harold Mar 28 '15 at 21:10
  • 1
    @harold: Or `lg2(x+1)` There are some fast binary logarithm hacks. – Ben Voigt Mar 28 '15 at 21:11
  • e.g. https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogLookup or https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightFloatCast – Ben Voigt Mar 28 '15 at 21:13
  • The count trailing zeros with `~x` looks promising. I'll try benchmarking the difference to see which is preferable. – Ryan Peschel Mar 28 '15 at 21:14
  • Ah, looks like `popcount` is still 5%-10% faster.. – Ryan Peschel Mar 28 '15 at 21:21
  • @Ryan: Which exact algorithm did you test? – Ben Voigt Mar 28 '15 at 21:22
  • @BenVoigt: The one that just does `return Mod37BitPosition[(-v & v) % 37]` given `~v` as an input from https://graphics.stanford.edu/~seander/bithacks.html#ZerosOnRightModLookup. I'm trying some other ones now. I can't seem to get yours to give the proper output though. – Ryan Peschel Mar 28 '15 at 21:25
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74011/discussion-between-ryan-peschel-and-ben-voigt). – Ryan Peschel Mar 28 '15 at 21:26
  • This is also interesting: https://www.klittlepage.com/2013/12/21/twelve-days-2013-de-bruijn-sequences/ I think it's the theory behind the one you tried. Modulo (%) is obviously going to be pretty slow. – Ben Voigt Mar 28 '15 at 21:26

1 Answers1

1

Here's a C# implementation that performs "find highest set" (binary logarithm). It may or may not be faster than your current PopCount, it surely is slower than using the real clz and/or popcnt CPU instructions:

static int FindMSB( uint input )
{
    if (input == 0) return 0;
    return (int)(BitConverter.DoubleToInt64Bits(input) >> 52) - 1022;
}

Test: http://rextester.com/AOXD85351

And a slight variation without a conditional branch:

/* precondition: ones are right-justified, e.g. 00000111 or 00111111 */
static int FindMSB( uint input )
{
    return (int)(input & (int)(BitConverter.DoubleToInt64Bits(input) >> 52) - 1022);
}
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720