0

I need to read data from a stream using the following algorithm:

-Count all consecutive set bits ("1"s) from the stream.

-Then, read k more bits from the stream. K is variable and changes throughout the program. Lets call the read data "m"

The decoded number then is

number = (consecutive_set_bits << k) + m;

This algorithm is executed a very large amount of times. Because of this, it is crucial that this piece of code be as fast as possible.

The main problem is that the number of coded numbers in a 1byte, two byte, four byte, etc. set is variable, and thus a trivial implementation (the only one that I have in my head right now) requires a loop that reads single bits from the stream. In the worst case, I have 14 iterations through the loop for just one coded coefficient.

Can I avoid this loop somehow?

TravisG
  • 2,373
  • 2
  • 30
  • 47

1 Answers1

0

The idea of sequentially extracting single bits is not too bad. If done right, it may be almost as fast as any other solution.

Bit sequences at arbitrary positions in a stream of granularity g, with g=16 for a stream of (16-bit) words for instance, can be handled block-wise on blocks of size g.

To extract the bits at positions s through e (with (e - s) <= g) from a stream as a 'right-aligned' number an example implementation may be:

shift = s % g

lowerBits = data[ floor( s / g ) ] >> shift
upperBits = data[ floor( e / g ) ] << (g - shift)

bitSequence = (lowerBits | upperBits) & ( (1 << (e-s)) -1 )[*]

[*] this last term only masks out any unneeded upper bits we may have got and makes them 0 in the final result.

(careful with the endianess of your data :))

Whether this will really speed things up or not cannot be determined in general. (Depends on the data being processed, the underlying computing hardware, the compiler used &c. Note that some divisions and one modulo operation are required which might slow down the algorithm significantly.)

Extracting bits one-by-one can be done quite efficiently in the same manner. For example:

blockIndex = floor( bitPosition / g )
bitIndex = bitPosition % g
nextBit = (data[ blockIndex ] >> bitIndex) & 1

This can of course be optimized to avoid the re-calculation of blockIndex and bitIndex if and when the bitPosition is always only incremented by 1.

Another common approach is to use a variable 'mask' to extract the single bits:

mask = 1
index = 0
while ( not all bits read ) { 
  block = data[index]
  if ( mask & block != 0 ) {
    // a 1 was encountered
  } else {
    // a 0 was encountered
  }
  mask = mask << 1
  if ( mask == 0 ) {
    mask = 1
    index = index + 1
  }
}

Note how the mask is used to both mask the current bit and keep track of when to advance to the next block of data. For this to work, mask must of course be of the same width g as the data blocks.

To sum it all up:

I don't think, in a general case, the solution can be more efficient than one loop iteration per bit read and any optimizations will only slightly change the performance in one direction or the other.

JimmyB
  • 12,101
  • 2
  • 28
  • 44
  • Your last sentence is disappointing. I need to get an algorithm running below 120ms / second. Right now it takes about 160, about 100ms of which are because I count the consecutive ones by extracting bits bit-by-bit. I was hoping there would be a much faster solution, but I guess not :P – TravisG Dec 13 '12 at 14:23
  • I assume you have already included the basic optimizations? - Try to avoid repeated calculations of the same value in the loop; like in my last example the `block = data[index]` operation *for every bit* - this should of course only be done each time the `index` changes. Can you post your code? – JimmyB Dec 13 '12 at 17:46
  • Just as an info, the final algorithm I ended up using was just to read an entire 32 bit integer, then do __asm{ not integer; bsr eax,integer } thus I have the consecutive ones, then reading k more bits is just shifting the integer to mask out your k bits. – TravisG Feb 06 '13 at 08:59
  • Thanks for getting back to share your solution :) Your code is for i386 architecture in this case? – JimmyB Feb 12 '13 at 08:05
  • Yep. You can make it somehow portable by just doing ~integer; and using an intrinsic for bsr, but on VS2005 for some reason the compiler didn't generate the bsr instruction for me so I had to do it in asm. Of course, this screws me in x64 mode but whatever. – TravisG Feb 13 '13 at 20:12