I would like to know if the following is possible in any of the SIMD families of instructions.
I have a qword input with 63 significant bits (never negative). Each sequential 7 bits starting from the LSB is shuffle-aligned to a byte, with a left-padding of 1 (except for the most significant non-zero byte). To illustrate, I'll use letters for clarity's sake.
The result is only the significant bytes, thus 0 - 9 in size, which is converted to a byte array.
In: 0|kjihgfe|dcbaZYX|WVUTSRQ|PONMLKJ|IHGFEDC|BAzyxwv|utsrqpo|nmlkjih|gfedcba
Out: 0kjihgfe|1dcbaZYX|1WVUTSRQ|1PONMLKJ|1IHGFEDC|1BAzyxwv|1utsrqpo|1nmlkjih|1gfedcba
Size = 9
In: 00|nmlkjih|gfedcba
Out: |0nmlkjih|1gfedcba
Size = 2
I do understand the padding is separate. The shuffle-aligning is my question. Is this possible?
EDIT 2
Here is my updated code. Gets a sustained 46 M / sec for random-length input on single thread Core 2 Duo 2 GHz, 64 bit.
private static int DecodeIS8(long j, ref byte[] result)
{
if (j <= 0)
{
return 0;
}
int size;
// neater code: gives something to break out of
while (true)
{
result[0] = (byte)((j & 0x7F) | 0x80);
size = 0;
j >>= 7;
if (j == 0) break;
result[1] = (byte)((j & 0x7F) | 0x80);
size++;
j >>= 7;
if (j == 0) break;
result[2] = (byte)((j & 0x7F) | 0x80);
size++;
j >>= 7;
if (j == 0) break;
result[3] = (byte)((j & 0x7F) | 0x80);
size++;
j >>= 7;
if (j == 0) break;
result[4] = (byte)((j & 0x7F) | 0x80);
size++;
j >>= 7;
if (j == 0) break;
result[5] = (byte)((j & 0x7F) | 0x80);
size++;
j >>= 7;
if (j == 0) break;
result[6] = (byte)((j & 0x7F) | 0x80);
size++;
j >>= 7;
if (j == 0) break;
result[7] = (byte)((j & 0x7F) | 0x80);
size++;
j >>= 7;
if (j == 0) break;
result[8] = (byte)j;
return 9;
}
result[size] ^= 0x80;
return size + 1;
}