Need help identifying a bit manipulation technique

Question

I need help identifying the following technique. Is a lengthy read so please try to follow. My question is if this is a known standard, does it have a name, can anyone relate or seen this before. What is the benefit. Also in case you wonder, this is related to a packet captured on a long forgotten online PS2 game and I am part of a team that is trying to bring it back.

Note that this is not the size as described by the ip protocol this size representation is withing the actual payload and it is for client and server consumption. The following read describes the how the size of the message is being represented. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The true packet length is 94 bytes long. These are bytes 5-6 [CF E0] on the payload data after all of the ip protocol stuff. Also, note that we must interpret these two bytes as being in little endian format. Thus, we should think of these two bytes as being

[E0 CF] We determine the Packet Class from these two bytes by taking the first nibble(4 bits) of the first byte. In this particular case, this is just 0xE. We would then identify this packet as having a packet class of 0xE. This was identified as a Session Initiator Packet Class.

Now, to determine the packet length from the remaining nibble and second byte. First we convert the second byte to decimal, we get 0xCF = 207. The different between this value and the actual length is 207-94=113 bytes. Originally I knew this byte was proportional to the packet length, but just had some offset. I wasn't sure where this offset came from. Additionally, this offset seemed to change for different packets. More study was required.

Eventually, I found out that each packet class had a different offset. So I needed to examine only packets in the same packet class to figure out the offset for that packet class. In doing this, I made a table of all the reported lengths (in byte 5) and compared that to the actual packet length. What I discovered is that

almost all of reported packet lengths in byte 5 were greater than 0x80=128. the second nibble in the other byte worked as a type of multiplier for the packet length that each packet class had an associated minimum packet length and maximum packet length that could be represented. For the 0xC packet class I was examining, the minimum packet size was 18 bytes and the maximum packet size was approximately 10*128 +17 = 1297 bytes. This led to the following approach to extract the packet length from the fifth and sixth byte packet header. First note that we have previously determined the packet class to be 0xE and that the minimum packet size associated with this packet class is 15 bytes. Now, take the second nibble of the first byte [0xE0] = 0 in this case and multiply it by 128 bytes 0*128 = 0 bytes. Now add this to the second byte [0xCF] = 207 in this case and subtract out 128. So 0 + 207 - 128 = 79. Now we need to add in the minimum packet size for this packet class 0xE = 15 byte minimum packet size. So (0*128)+(207-128) -15 = 94. This is the reported true packet size.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This formula was tested on 20,000 subsequent packets and it works. But why go trough all that trouble just to indicate the size of the message that follows? I thought it was a form of encryption but the rest of the message is not encrypted at all. The formula is understood but I don't see a benefit. I am thinking that maybe is a way to optimize the size of the packet by passing a number greater than 255 using only one byte but that only saves me exactly one byte, throwing another byte yields a Max value of 65,535 so why not throw another byte into the byte stream. I am sure one extra byte is not going to have a great impact on the network so what could be the purpose. I thought that maybe someone else would see what's missing or connected to some kind of documented standard, protocol, pattern, technique or something that is documented somewhere.

Also, I do not take credit for figuring out the formula above, that was done by another team member.

I think it is possible that the second byte has a requirement that bit 7 must always be set (for whatever reason, e.g. backward compatibility). When processing the length, bit 7 needs to be cleared first with AND 0x7F, or 128 must be subtracted from the byte value read (128 = 2**7). It looks like the packet length is encoded in byte1[3:0]:byte2[6:0] providing an 11-bit byte count. The minimum length is presumably subtracted out so as to maximize the possible length values that can be encoded; encode-able length is in [15,15*128+127+15] = [15,2062]. — njuffa, Dec 07 '16 at 19:23
It would be useful to post a few more examples: Is the "Class" always equal to the min size? Is there an example with byte5 bit7 == 0? Is there an example with byte6 low nibble non-zero? — AShelly, Dec 14 '16 at 21:37

score 0 · Answer 1 · answered Dec 14 '16 at 22:55

My best guess is that the receiver uses some form of variable-length base128 encoding, like LEB128.

But in this case, the sender, knowing the actual max size fits in 11 bits, forces the encoding to to use 2 bytes, and overloads the high nibble for "class". This makes the header size and construction time constant. And the receiver side can just mask out the class and run it through a standard decoder.

Send:

len -= minlen[class]
byte[5]=(len&0x7F)|0x80;
byte[6]=(len>>7)|(class<<4);

Receive:

class = byte[6]>>4;
byte[6]&=0xF;
len = decode(&byte[5]) + minlen[class];

where:

int decode(byte* data) {
  int v=*data&0x7F;
  while (*data & 0x80) {
    data++;
    v+=*data&0x7F;
  }
  return v;
}

One other possibility is that byte[5] is signed, and length is reconstructed by
(int8_t)byte[5] + 128*((byte[6]&0xF)+1) + minlen[byte[6]>>4];
But I can't think of any reason to construct it this way.

Need help identifying a bit manipulation technique

1 Answers1