1

Is it possible to use SSE for bit manipulations on data that is not byte-aligned? For example, I would like to do implement this using SSE:

const char buf[8];
assert(n <= 8);
long rv = 0;
for (int i = 0; i < n; i++)
    rv = (rv << 6) | (buf[i] & 0x3f);

Instead, I would like load buf into a xmm register and use SSE instructions to avoid the loop. Unfortunately, the shift operations (such as PSLLW) shift each packed integer by the same amount, so I cannot use it here. Using multiplication (PMULLW) to emulate shifts does not seem right either...

Looking at the SSE documentation, it appears that bit manipulations are not particularly well supported in general. Is this true? Or are there nice bit-manipulation examples using SSE?

hrr
  • 1,807
  • 2
  • 21
  • 35

2 Answers2

4

I'm not sure SSE instructions help reduce the number of operations required to implement what your code perform here; if anyone knows, I'd be curious as well. Let's decompose the code a bit.

The code is a recursive shift / or sequence, meaning you take the lowest 6 bits, shift them left by six, or the next 6 bits in, shift again, and so on.

So you're converting an array of eight-bit values to a packed array of six-bit values you shrink things from 64bits to 48bits. Like:

|76543210|76543210|76543210|76543210|76543210|76543210|76543210|76543210|
|-----------------|54321054|32105432|10543210|54321054|32105432|10543210|

You can therefore unwind the loop and write it as follows:

/*
 * (buf[x] << 58)
 *   moves lowest six bits of a 64bit long into the highest bits, clears others
 *
 * >> (6 * x + 16)
 *   shifts the bits into the expected final position
 */
#define L(x) (((long)buf[x] << 58) >> (6 * x + 16))

long rv = L(0) | L(1) | L(2) | L(3) | L(4) | L(5) | L(6) | L(7);

As mentioned, I'm not aware of a SSE instruction that would help with this kind of packing (SSE packs do quad-to-word, word-to-short, short-to-byte).

You can perform the operations inside SSE registers, but not, as far as I can see, reduce the number of instructions required to get at the end result.

FrankH.
  • 17,675
  • 3
  • 44
  • 63
0

There are quite a few bitwise operations you can perform in SSE. You can just use _mm_and_si128, _mm_or_si128 and there is a huge set of shift-operations. Google _mm_slli_si128 to find the complete list. These instructions have been added to SSE2 so they're widely available.

Jasper Bekkers
  • 6,711
  • 32
  • 46
  • I have found [Intel's Intrinsic Guide](http://software.intel.com/en-us/avx/) particularly useful to find SSE instructions. Unfortunately, even though there are many shift instructions, they all take a fixed shift amount (i.e., one cannot shift each packed integer differently)... – hrr Jul 12 '11 at 20:28
  • You'll be able to do this with AVX 2.0 in 2013, but until then you're out of luck. – Paul R Jul 12 '11 at 20:35
  • 1
    Yeah SSE is quite bad when it comes to certain operations. Just use integer multiplication to achieve the same result. – Jasper Bekkers Jul 12 '11 at 20:39
  • There is _mm_sll_si128, which takes a xmm register with the shift amount. What you want to do can be done in SSE. – Gunther Piez Jul 12 '11 at 22:12
  • @drhirsh, if I understand correctly, this also shifts each packed integer by the _same_ amount, so it would not be useful here... – hrr Jul 13 '11 at 07:02