Is it possible to use SSE for bit manipulations on data that is not byte-aligned? For example, I would like to do implement this using SSE:
const char buf[8];
assert(n <= 8);
long rv = 0;
for (int i = 0; i < n; i++)
rv = (rv << 6) | (buf[i] & 0x3f);
Instead, I would like load buf into a xmm register and use SSE instructions to avoid the loop. Unfortunately, the shift operations (such as PSLLW) shift each packed integer by the same amount, so I cannot use it here. Using multiplication (PMULLW) to emulate shifts does not seem right either...
Looking at the SSE documentation, it appears that bit manipulations are not particularly well supported in general. Is this true? Or are there nice bit-manipulation examples using SSE?