6

I want to represent a string of up to around 120 bits, and speed is critical. I need to be able to build a bitstring by repeated snoc operations, and then to consume it with repeated uncons operations. One idea is to steal the implementation of Word128 from data-dword and use something like this to build:

empty = 1
snoc xs x = (xs `shiftL` 1) .|. x

But the unconsing seems to get a bit ugly, having to first countLeadingZeros and shift left to eliminate them before being able to read off the elements by shifting and masking the high bits.

Is there some more pleasant way that's at least as fast, or some faster way that's not too much more unpleasant?


Context

Phil Ruffwind has proposed a version of lens's at for Data.Map, but all implementations thus far are substantially slower than the naive implementation lens currently uses when key comparison is cheap. If I could produce a very cheap representation of the path to an entry while looking it up, and then consume it very efficiently with a specialized version of insert or delete, then maybe I could make this worthwhile.

Community
  • 1
  • 1
dfeuer
  • 48,079
  • 5
  • 63
  • 167
  • 2
    Make your `Word128` into a [ring buffer](https://en.wikipedia.org/wiki/Circular_buffer), perhaps? Just store the index of the head and tail of your queue. – Daniel Wagner May 01 '16 at 21:16
  • @DanielWagner, that's an option. It takes an extra word for the index, but it's nicer. A ring buffer or any other full queue is a bit more than I need, since I do all my snoccing before I start unconsing. – dfeuer May 01 '16 at 21:21
  • 1
    @dfeuer how about store the length of the string in the extra 8 bits? You can read this length before snoc/uncons and write it after. – erisco May 01 '16 at 22:42
  • @erisco, that won't actually help. Using a separate word would be better, because unpacking and repacking that byte will use it anyway. – dfeuer May 01 '16 at 23:37
  • 2
    @dfeuer maybe it is unpacked and stored in a register rather than in cache or main memory though, which would be a win (a smaller memory footprint lets you fit more into your cache lines, if the data is contiguous). If performance is that critical then it might be worth finding out. – erisco May 02 '16 at 00:56
  • 1
    @erisco, good point. Performance is mission critical. Whether the mission is critical is an entirely different question. ;-) – dfeuer May 02 '16 at 01:09
  • I'm fairly confident `countLeadingZeros` will be the fastest, just make sure to use `unsafeShift` and avoid `testBit` and `setBit` (instead reimplement using unsafe shifts). – András Kovács May 02 '16 at 05:21
  • @AndrásKovács, may I ask why you expect that? I ask because it's only available in GHC >= 7.10. – dfeuer May 02 '16 at 19:18
  • @dfeuer I backtrack on my comment, because chi's solution seems about as fast as `clz`. – András Kovács May 03 '16 at 06:36
  • @AndrásKovács, my tests suggest `ctz` is faster for `uncons` with two words, but of course it's also possible my code is at fault. Dealing with two words is painful. I'm not sure if there's a good way to deal with carries. – dfeuer May 03 '16 at 06:39

1 Answers1

2

I am not sure if this qualifies. I fear that I'm re-implementing countLeadingZeros in some form...

Anyway, the idea is to snoc bits from the left, shifting right. Then, we can "count" the trailing zeros of x using x-1 and a XOR. The result of the "count" is a mask "00..01..11" which, roughly, is a unary representation of the trailing zeros. We do not convert this unary to binary since we have no need to: with some bit-level work, we can uncons.

Untested and unproven code follows.

import Data.Word
import Data.Bits
import Text.Printf

type T = Word64     -- can be adapted to any WordN

-- for pretty printing
pr :: T -> String
pr x = printf "%064b\n" x

empty :: T
empty = shiftL 1 63

snoc :: T -> T -> T
snoc x xs = shiftR xs 1 .|. (shiftL x 63)

-- returns (head, tail)
-- head is not normalized (0 or 1), only (0 or /=0)
uncons :: T -> (T, T)
uncons xs = 
   let -- example
       -- 0101001100000000000   xs  
       y = (xs `xor` (xs - 1))
       -- 0000000111111111111   y
       z = shiftR y 1 + 1
       -- 0000000100000000000   z
       z' = shiftL z 1
       -- 0000001000000000000   z'
   in (xs .&. z' , (xs .&. complement z) .|. z' )
chi
  • 111,837
  • 3
  • 133
  • 218
  • This works, indeed, and is rather elegant in its own way. Unfortunately, it seems that unconsing is rather slow, at least with the implementations I've come up with for double words for the relevant operations. I guess I'll try one of the indexing ideas next.... – dfeuer May 02 '16 at 21:59
  • I found [elsewhere online](http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/) that you can go straight to `z = xs .&. (- xs)`. Per Edward Kmett's advice, I decided to try using this 1-word variant with a `Map` size check to fall back. Unfortunately, it seems that the trouble of doing the size check is enough, or nearly enough, to negate the time saved by using a one-word queue. So I think I'm going to stick with the two-word `countTrailingZeros` one for `Data.Map`. Thanks again for your help. – dfeuer May 12 '16 at 17:20