0

Alright so here goes.

I currently need to write an extension method for the System.IO.BinaryReader class that is capable of reading a specific format. I have no idea what this format is called but I do know exactly how it works so i will describe it below.

Each byte that makes up the value is flagged to indicate how the reader will need to behave next. The first byte has 2 flags, and any subsequent bytes for the value have only 1 flag.

First byte:
 01000111
 ^^^^^^^^
 |||____|_ 6 bit value
 ||_______ flag: next byte required
 |________ flag: signed value

Next bytes:
 00000011
 ^^^^^^^^
 ||_____|_ 7 bit value
 |________ flag: next byte required

The first byte in the value has 2 flags, the first bit is if the value is positive or negative. The second bit is if another byte needs to be read. The 6 remaining bits is the value so far which will need to be kept for later.

If no more bytes need to be read then you just return the 6 bit value with the right sign as dictated by the first bit flag.

If another byte needs to be read then you read the first bit of that byte, and that will indicate if another byte needs to be read. The remaining 7 bits are the value here. That value will need to be joined with the 6 bit value from the first byte.

So in the case of the example above:

The first value was this: 01000111. Which means it is positive, another byte needs to be read, and the value so far is 000111.

Another byte is read and it is this: 00000011 Therefore no new bytes need to be read and value here is this: 0000011 That is joined onto the front of the value so far like so: 0000011000111

That is therefore the final value: 0000011000111 or 199

0100011100000011 turns into this: 0000011000111

Here is another example:

 011001111000110100000001
 ^^^^^^^^^^^^^^^^^^^^^^^^
 |      ||      ||______|_____ Third Byte (1 flag)
 |      ||______|_____________ Second Byte (1 flag)
 |______|_____________________ First Byte (2 flags)

First Byte:
 0      - Positive
 1      - Next Needed
 100111 - Value

Second Byte:
 1       - Next Needed
 0001101 - Value

Third Byte:
 0       - Next Not Needed
 0000001 - Value

Value:
 00000010001101100111 = 9063

Hopefully my explanation was clear :)

Now i need to be able to write a clear, simple and, and most importantly fast extension method for System.IO.BinaryReader to read such a value from a stream. My attempts so far are kind of bad and unnecessarily complicated involving boolean arrays and bitarrays.

Therefore I could really do with somebody helping me out with this in writing such a method, that would be really appreciated!

Thanks for reading.

  • This can be done byte-by-byte of course. What does the "signed value" flag mean exactly? – harold Jul 14 '18 at 00:06
  • The signed bit is simplly if it is positive or negative, if 0 it is positive, 1 it is negative. –  Jul 14 '18 at 20:12
  • But how? Does that mean if the bit is set, negate the raw value? Or sign extend? Something else? Can a negative zero be encoded and if so, what do you want to decode it to? – harold Jul 14 '18 at 22:54
  • It's literally treated as a boolean, once you have the final value multiply it by -1 if that was true. If it is 0 then it stays zero. Here are another 2 examples: 10001001 => 001001 => -9 110011001001000100000001 => 00000010010001001100 => -9292 –  Jul 15 '18 at 00:10
  • It would never be the case that you would end up with something like 10000000, simply because the value is zero, which when encoded to begin with would just be 00000000. –  Jul 15 '18 at 00:23
  • I know this is confusing :) but I'm trying to reverse engineer some binary data. This type of encoded value is typically placed at the beginning of a block of bytes, and reading the value will tell you how many bytes to read. Therefore it would never make sense to have a zero value. Now I know that doesn't make sense if it can return a negative value but there are some fringe cases, such as strings where a negative value indicates to read in Unicode, and positive to read in ASCII –  Jul 15 '18 at 00:23
  • tbh it's not important the sign, what matters to me is getting the value from the 6-7-7 etc.. structure. I'd be perfectly happy with a tuple return containing a boolean and the (always positive in this case) value. –  Jul 15 '18 at 00:28

2 Answers2

0

Based on the description in the comments I came up with this, unusually reading in signed bytes since it makes the continue flag slightly easier to check: (not tested)

static int ReadVLQInt32(this BinaryReader r)
{
    sbyte b0 = r.ReadSByte();
    // the first byte has 6 bits of the raw value
    int shift = 6;
    int raw = b0 & 0x3F;
    // first continue flag is the second bit from the top, shift it into the sign
    sbyte cont = (sbyte)(b0 << 1);
    while (cont < 0)
    {
        sbyte b = r.ReadSByte();
        // these bytes have 7 bits of the raw value
        raw |= (b & 0x7F) << shift;
        shift += 7;
        // continue flag is already in the sign
        cont = b;
    }
    return b0 < 0 ? -raw : raw;
}

It can easily be extended to read a long too, just make sure to use b & 0x7FL otherwise that value is shifted as an int and bits would get dropped.

harold
  • 61,398
  • 6
  • 86
  • 164
  • Thank you! works beautifully, I did end up changing it to a long instead. –  Jul 15 '18 at 00:55
0

Version that checks for illegal values (an overlong sequence of 0xFF, 0xFF... for example, plus works with checked math of C# (there is an option in the C# compiler to use cheched math to check for overflows)

public static int ReadVlqInt32(this BinaryReader r)
{
    byte b = r.ReadByte();

    // the first byte has 6 bits of the raw value
    uint raw = (uint)(b & 0x3F);

    bool negative = (b & 0x80) != 0;

    // first continue flag is the second bit from the top, shift it into the sign
    bool cont = (b & 0x40) != 0;

    if (cont)
    {
        int shift = 6;

        while (true)
        {
            b = r.ReadByte();
            cont = (b & 0x80) != 0;
            b &= 0x7F;

            if (shift == 27)
            {
                if (negative)
                {
                    // minumum value abs(int.MinValue)
                    if (b > 0x10 || (b == 0x10 && raw != 0))
                    {
                        throw new Exception();
                    }
                }
                else
                {
                    // maximum value int.MaxValue
                    if (b > 0xF)
                    {
                        throw new Exception();
                    }
                }
            }

            // these bytes have 7 bits of the raw value
            raw |= ((uint)b) << shift;

            if (!cont)
            {
                break;
            }

            if (shift == 27)
            {
                throw new Exception();
            }

            shift += 7;
        }
    }

    // We use unchecked here to handle int.MinValue
    return negative ? unchecked(-(int)raw) : (int)raw;
}
xanatos
  • 109,618
  • 12
  • 197
  • 280