Regarding bit masking in C. Why (~(~0 << N)) is preferred than ((1 << N) -1)?

Question

I do know that ~0 will evaluate the maximum word sized bit 1s (and thus takes caring of portability), but I am still not getting why ((1 << N) - 1) is discouraged?

Please share if you used the second form and got into any trouble.

Because not all C compilers/platforms use 2-complement for negative numbers. — jv42, Oct 05 '11 at 09:53
@jv42 why does that apply? I see no negative numbers there, except ~0 but its negativity isn't needed for it to work — harold, Oct 05 '11 at 10:00
Can you provide a source where the `((1 << N) -1)` is discouraged? — Juraj Blaho, Oct 05 '11 at 10:07
@harold: The issue is if you do `unsigned int x = ~0;` in (for example) 1s' complement. `~0` is all-bits-set, and as a signed value that represents negative zero (or is a trap representation). So when it's converted to unsigned, the result is 0, not `UINT_MAX`. `unsigned int x = ~0;` does not take care of portability if by "portability" we mean "including non-2's-complement". The correct ways to do that are `unsigned int x = -1;` or `unsigned int x = UINT_MAX;`. — Steve Jessop, Oct 05 '11 at 10:40
@SteveJessop ~0 *is* -1 with 2's complement, but -1 need not be "all bits set" (in case of non-2's-complement, ie never). — harold, Oct 05 '11 at 10:46
@harold: that's true, but I'm not talking about 2's complement. In a 1s' complement implementation which doesn't support negative zeros, `~0` is undefined behavior. So strictly conforming code cannot write `~0`. `~0u` is OK, though. And the same considerations apply to `(1 << N)` in sign-magnitude representation, as apply to `~0` in 1's complement, so they're equally bad as far as signed representation is concerned. — Steve Jessop, Oct 05 '11 at 10:49
@SteveJessop that seems to argues in favour of `(1 << N) - 1` then, which could still fail but certainly less often (only when N >= 31) — harold, Oct 05 '11 at 10:54
@harold: well, if `N` isn't the width of `int` minus 1, then `(1 << N) - 1` certainly won't yield the representation with all bits set. I think that value is what's wanted. So I was assuming that `N` was just standing in for 31 on implementations with a 32-bit `int`, and some other value on other implementations. The fundamental problem is that on a 1s' complement representation, there might not *be* a signed value with all bits set, which means in strictly-conforming code you need to do your bit-twiddling in unsigned types. — Steve Jessop, Oct 05 '11 at 11:02
Similar issues were addressed fairly recently in Michael Barr's blog: http://embeddedgurus.com/barr-code/2011/06/is-uint16_t-1-portable-c-code. There was a great reader comment which addressed this very issue. — Lundin, Oct 05 '11 at 11:15
@SteveJessop I was assuming N was not "#bits - 1", otherwise there wouldn't be a problem in the first place - just use UINT_MAX — harold, Oct 05 '11 at 11:18
@harold: yes, the fact that the questioner doesn't actually say what result he wants, doesn't make it any easier to advise how to do it! If `N` is smaller then there's no reason to discourage `(1 << N) - 1`. — Steve Jessop, Oct 05 '11 at 11:22
-1 for presupposing the worse form is better rather than asking which is better and why, or if there's a better alternative to both. — R.. GitHub STOP HELPING ICE, Oct 05 '11 at 19:02

Dennis · Accepted Answer · 2011-10-15T15:18:38.810

10

Look at these lines:

1. printf("%X", ~(~0 << 31) );
2. printf("%X", (1 << 31) - 1 );

Line 1 compiles and behaves like expected.

Line 2 gives the warning integer overflow in expression.

This is because 1 << 31 is treated by default as a signed int, so 1 << 31 = -2147483648, which is the smallest possible integer.

As a result, resting 1 causes an overflow.

edited Oct 15 '11 at 15:18

answered Oct 05 '11 at 10:07

Dennis

14,264
2
48
57

7

You can do `(1u << 31) - 1` without warning. – Juraj Blaho Oct 05 '11 at 10:11
@Juraj Blaho: Yes, but that wasn't the question. `((1 << N) - 1)` simply doesn't work. – Dennis Oct 05 '11 at 10:16
Thank you Dennis! yes it does warn.. :) But still can't overflow be ignored when used as bit mask? – MS. Oct 05 '11 at 10:20
4

@MS.: Signed overflow is UB, so it may or may not work depending on the compiler. Gcc does aggressive optimization based on the fact that signed integer should never overflow. – Juraj Blaho Oct 05 '11 at 10:23
@R.. nicely put for saying that this answer is just plain wrong. It is really a pity that compilers don't warn about `1 << 31`. That it is `-2147483648` is just a coincidence. If `int` is 32 bit wide the result of `1 << 31` is not representable in `int` that's it, and we have UB. – Jens Gustedt Oct 05 '11 at 20:40

score 4 · Answer 2 · answered Oct 05 '11 at 18:59

4

The first form is definitely not preferred, and I would go so far as to say it should never be used. On a ones complement system that does not support negative zero, ~0 may very well be a trap representation and thus invoke UB when used.

On the other hand, 1<<31 is also UB, assuming int is 32-bit, since it overflows.

If you really mean 31 as a constant, 0x7fffffff is the simplest and most correct way to write your mask. If you want all but the sign bit of an int, INT_MAX is the simplest and most correct way to write your mask.

As long as you know the bitshift will not overflow, (1<<n)-1 is the correct way to make a mask with the lowest n bits set. It may be preferable to use (1ULL<<n)-1 followed by a cast or implicit conversion in order not to have to worry about signedness issues and overflow in the shift.

But whatever you do, don't use the ~ operator with signed integers. Ever.

answered Oct 05 '11 at 18:59

R.. GitHub STOP HELPING ICE

208,859
35
376
711

I would never have expected that '1 << 31` is undefined behaviour. Literally all text I had read until today stated that a `signed int` can take all values from `-2147483648` to `2147483647`, both inclusive. Question: Does `#define INT_MIN (-2147483647 - 1)` fix this? I ask since this is what `limits.h` does in gcc, tcc and icl... – Dennis Oct 05 '11 at 22:22
The fact that `signed int` can take all values in that range has nothing to do with the fact that `1<<31` overflows. It's simply a matter of arithmetic. 2^31 is greater than `INT_MAX` (assuming 32-bit int) and thus it's an overflow. – R.. GitHub STOP HELPING ICE Oct 06 '11 at 01:19
Then it seems like I'm not understanding well the bitshift operation. I learned once that `1 << 31`, rather than `2^31`, is `1` shifted 31 places to the left and padded on the right side with `0s`. That would yield `10000000 00000000 00000000 00000000` in binary or `0x80000000` in hexadecimal or `-2147483648` as signed integer. Where did I go wrong? – Dennis Oct 06 '11 at 01:26
1

In C, `x< – R.. GitHub STOP HELPING ICE Oct 06 '11 at 01:44
I don't believe this can invoke undefined behavior. C99 6.3.1.3 §3 `"Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised."` – Lundin Oct 06 '11 at 07:05
@Lundin: That language is about type conversions. No conversion is taking place here. If you used `1U<<31`, then converted that to `int`, the behavior would be implementation-defined, not undefined. And if you used `(1U<<31)-1`, the result would be within the range of `int` and thus the conversion would be well-defined. – R.. GitHub STOP HELPING ICE Oct 06 '11 at 12:33
@R ~ comes with integer promotion, but ok, in this specific example there won't be any type conversion as the 0 is of type signed int. So you are correct, I found the relevant part of the standard is 6.2.6.2 §4 which indeed states that this is UB. So formally ~0 is indeed bad practice, even though personally I have yet to see a one's compliment system in the real world. – Lundin Oct 07 '11 at 11:01

score 1 · Answer 3 · answered Oct 05 '11 at 11:18

I would discourage both, shift or complement operations on signed values is simply a bad idea. Bit patterns should always be produced on unsigned types and (if even necessary) then transposed to the signed counter parts. Then using the primitive types is also no so good as an idea because usually on bit patterns you should control the the number of bits that you are handling.

So I'd always do something like

-UINT32_C(1)
~UINT32_C(0)

which are completely equivalent and at the end this comes just to use UINT32_MAX and Co.

Shift is only necessary in cases you don't shift fully, something like

(UINT32_C(1) << N) - UINT32_C(1)

score 0 · Answer 4 · answered Oct 05 '11 at 10:14

0

I would not prefer one to another, but I've seen many bugs with (1<<N) where the value had to be 64-bit but "1" was 32-bit (ints were 32-bit) and the result was wrong for N>=31. 1ULL instead of 1 would fix it. That's one danger of such shifts.

Also, shifts of ints by CHAR_BIT*sizeof(int) or more positions (similarly for long long's (which are often 64-bit) by CHAR_BIT*sizeof(long long) or more positions) aren't defined. Because of that it may be safer to shift right like this: ~0u>>(CHAR_BIT*sizeof(int)-N), but in this case N can't be 0.

answered Oct 05 '11 at 10:14

Alexey Frunze

61,140
12
83
180

1

Doesn't `~0` also have the problem of being 32 bits? – harold Oct 05 '11 at 10:21
@harold: ~0 will be 32-bit if ints are 32-bit. But what problem are you referring to? Oh, understandably, in `~0u>>(CHAR_BIT*sizeof(int)-N)` `1 <= N <= CHAR_BIT*sizeof(int)`. You can extend this expression to long-long appropriately. – Alexey Frunze Oct 05 '11 at 10:27
1

Well I mean, it doesn't give a reason to pick ~(~0 << N) over (1 << N) - 1 – harold Oct 05 '11 at 10:30
@harold: I just pointed out a few problems I'd encountered with such constructs. Something to watch out for. – Alexey Frunze Oct 05 '11 at 10:33

Pete Wilson · Answer 5 · 2011-10-05T10:22:41.337

0

EDIT: corrected a stupid error; and noted possible overflow problems.

I have never heard that one form is preferred over the other. Both forms are evaluated at compile time. I always use the second form, and I've never gotten into any trouble. Both forms are perfectly clear to the reader.

Other answers noted the possibility of overflow in the second form.

I see little to choose between them.

edited Oct 05 '11 at 10:22

answered Oct 05 '11 at 10:14

Pete Wilson

8,610
6
39
51

1

How could the first be evaluated at compile time and the second could not? – Juraj Blaho Oct 05 '11 at 10:17

score -2 · Answer 6 · answered Oct 05 '11 at 10:20

-2

Why Discouraged
~0 is a single cycle operation and hence faster ((1<first do a shift and then a subtraction which is an arithmetic operation. so due to subtraction it will consume a lot of cycles and hence unnecessary overhead.

More
more over, when you do ((1 << N)-1) or ((M << N)-1) is same, assuming N refers to M's size in bits because it will flush all the bits. here 1 is integer, typically 32 bit on almost all the present platforms 32/64 bit, so N can be assumed 32.

The result will however not same if you typecast 1 to long and do (((long)1 << 32) -1). here you need to use 64 in place of 32, 64 being the size of long in bits.

answered Oct 05 '11 at 10:20

Abhinav

1,496
3
15
31

i am also interested to know if the conversion ((1<<32)-1) will be done at compile time or at the run time. – Abhinav Oct 05 '11 at 10:22
That's not true for any platform I know, but more importantly, any compiler that isn't completely braindead will use zero instructions and just calculate the constant. – harold Oct 05 '11 at 10:24
harold: what s not true?
in 1< – Abhinav Oct 05 '11 at 10:31
Neither shifts nor subtraction would consume lots of cycles. Shifts by more than 1 used to be slow back in the stone age though. – harold Oct 05 '11 at 10:38
which means the logical shift and arithmetic operations have same complexity? i don't feel that :) – Abhinav Oct 05 '11 at 10:40
essentially yes, see http://en.wikipedia.org/wiki/Barrel_shifter though off the top of my head, Core2 has a lower thoughput for shifts than for "simple" ops (2/clock vs 3/clock) - doesn't matter though, it will be a constant anyway, with no shifting being done at runtime – harold Oct 05 '11 at 10:48
If N is a constant, which I think we can assume, shift and complement operators are both going to be optimized to constants, so there is no difference in efficiency between the two examples. – Lundin Oct 05 '11 at 11:26
@Lundin: Why must N be a constant? – Juraj Blaho Oct 06 '11 at 06:18
@Juraj If it wasn't a constant, it would not be possible for the compiler optimize it at compile time. The compiler can still of course make predictions, if you have an auto variable on the stack like `int N=5;` and that variable isn't touched before the shifting, then the compiler can still treat it like a constant. But as soon as "N" is changed in runtime, the compiler can't do a thing. – Lundin Oct 06 '11 at 06:59
@Lundin: I understand that. I just wanted to point out that there are situations when the whole term could not be optimized to a constant. And in these situations it makes sense to think about performance differences between the two options. – Juraj Blaho Oct 06 '11 at 07:08
@Juraj No, I don't think that makes sense. ~0 and the shift implementation are only equivalent if the shift implementation is using a constant. I wouldn't expect the number of bits in an integer to vary in runtime... – Lundin Oct 06 '11 at 10:54
everyone gives reason for +1, why are you not giving for -1? – Abhinav Oct 10 '11 at 10:25
@ Juraj if N is not a constant equal to sizeof the 1 used above(32 for int, 64 for long), what purpose does it solve. For 1, best asumption of the type of the 1 is int. if we say ((1<<4)-1), i dont see anything reasonable doing it without a constant size N, with the value of N mentioned above. if you have something good to quote with N as a variable having an arbitrary value, please specify the usability. – Abhinav Oct 10 '11 at 10:32

Regarding bit masking in C. Why (~(~0 << N)) is preferred than ((1 << N) -1)?

6 Answers6