I have a mask with a small number of set bits, just 3 or 4 of them.
The mask can be up to 64 bit but let's take a short example - 10100101
I'd like to generate masks that stop at the set bits but include the lower bits up to the previous stop bit:
00000001
00000110
00111000
11000000
I can do that in a loop by isolating the lowest bit and adding the bits to its right ((x & -x) << 1) - 1
And then removing the previous mask using xor.
Question is can it be done more efficiently in parallel without looping with some swar or simd?