3

This is a follow-up to my last question. I managed to implement a pretty fast "bit queue builder" as

data BitQueueB = BQB !Word !Word

by pushing elements in from the left (starting with a guard bit), then when I'm done "building" the queue, counting the trailing zeros and shifting them off, installing a new guard bit on the left.

However, this solution doesn't really do anything for GHC < 7.10, since that's when countTrailingZeros was introduced. I also can't help wondering if there's some more magical way to accomplish this shift, or is leftward counterpart.

The point

I have a two-word representation of a double word, guaranteed to have at least one bit set. I'm looking for the fastest way to shift the double word to either the left or the right until the first set bit is shifted off, without using either countLeadingZeros or countTrailingZeros.

One thought

My poor intuition in these matters suggests that if there could be some way to use multiplication if I switch to shifting left, but maybe that's just wishful thinking.

Community
  • 1
  • 1
dfeuer
  • 48,079
  • 5
  • 63
  • 167
  • 1
    Maybe there's something here which can help: https://en.wikipedia.org/wiki/Find_first_set – chi May 04 '16 at 19:37
  • 2
    Well I don't really know haskell, but there is `x >> ctz(x) = x / (x & -x)` (iff `x != 0`). Then kick out one more bit. (there is a related one to the left but it's "less cute") – harold May 04 '16 at 19:55
  • @harold, what's the one to the left look like? – dfeuer May 04 '16 at 20:01
  • Longer so I guess I'll write up an answer – harold May 04 '16 at 20:04
  • @chi, those look like they could be some good options. – dfeuer May 04 '16 at 20:11
  • @dfeuer `ctz(x) = countTrailingZeros x`, so that's an equivalent option – chi May 04 '16 at 20:22
  • @chi, the implementation using `popCount` could be just the ticket for `4.5 <= base < 4.8`. – dfeuer May 04 '16 at 20:59
  • @dfeuer I wonder how hard it would be to add a custom near-assembly (llvm?) implementation of `countTrailingZeros` for those versions. One could just use the FFI and code everything in C, but that might be slow, and surely it won't be inlined. – chi May 05 '16 at 07:37
  • @chi, I don't know assembly, sadly, and don't really know how to learn it. The FFI overhead would almost certainly blow away any performance benefit. – dfeuer May 05 '16 at 17:08

1 Answers1

2

To the left is more annoying. Not actually tested:

m = x
m |= m >> 1
m |= m >> 2
m |= m >> 4
m |= m >> 8
m |= m >> 16
x = x * (0x80000000 / (m >> 1))

This moves the highest set bit to the top by first computing the highest power of two present in the number, and then multiplying by the amount necessary to turn that into the highest possible power of two. It doesn't like it when the top bit is already set (it would divide by 0) and it needs the number of bits to be known. So it's probably better to just count the leading zeros instead of this, or maybe there's something better but I don't know of it.

The version to the right (for completeness?) is nicer, just

x / (x & -x)

Where x & -x is a relatively well-known trick to isolate the lowest set bit.

harold
  • 61,398
  • 6
  • 86
  • 164
  • Ah, the only reason I was interested in going left was because I thought maybe it could avoid the division. Oh well. While this is pretty cool, I don't think it could actually help me--implementing arbitrary division of a two-word representation of a double word doesn't seem likely to be remotely pleasant or efficient! – dfeuer May 04 '16 at 20:30
  • Yes that would hurt. By the way, there is also a shortcut to "right-justify" by doing almost a count-trailing-zeros but instead of counting them only do the justification (which one of the ctz algorithms naturally produces anyway, but then ignores) – harold May 04 '16 at 20:40