Undefined behavior of right-shift in C++

Question

From cppreference.com:

For unsigned a and for signed a with nonnegative values, the value of a >> b is the integer part of a/2^b . For negative a, the value of a >> b is implementation-defined (in most implementations, this performs arithmetic right shift, so that the result remains negative).

In any case, if the value of the right operand is negative or is greater or equal to the number of bits in the promoted left operand, the behavior is undefined.

Why do we have an undefined behavior in case the right operand is greater or equal to the number of bits in the promoted left operand?
It seems to me that the result should be 0 (at least for unsigned/positive integers)...

In particular, with g++ (version 4.8.4, Ubuntu):

unsigned int x = 1;
cout << (x >> 16 >> 16) << " " << (x >> 32) << endl;

gives: 0 1

See: https://stackoverflow.com/questions/19636539/arithmetic-right-shift-gives-bogus-result/19636580 - not actually a duplicate question of this one, but certainly explains that it is undefined behaviour, and why. Short version: shift instructions in modern processors often won't shift by more than the register bit width. — davmac, Oct 04 '18 at 12:20
guaranteing that the result is `0` may incur overhead that most of the time you dont need. Why would you shift by more than the number has bits? — 463035818_is_not_an_ai, Oct 04 '18 at 12:20
Some assembler instructions for rightshift have only 5 bit for the second operand. The first 5 bit of 32 are 0, so you have 1 >> 0 in some assembler languages. — mch, Oct 04 '18 at 12:21
You can make your own: `unsigned int right_shift(unsigned int x, int shift_amount) { if (shift_amount >= std::numeric_limits::digits) return 0; return x >> shift_amount; }` Notice the extra work to check the size? That's why `>>` by itself doesn't do that. — Eljay, Oct 04 '18 at 12:32
At least on `x86` shift instruction considers/masks only lower `5` bits of the value, i.e. it's basically `x >> (num % 32)` (same for `<<`). Implementing it as you want would require an expensive (relatively) branch to check if `0 < num < 32`. — Dan M., Oct 04 '18 at 12:40
Long read, but super interesting if you're curious about undefined behaviour, especially the motivation behind it https://blog.regehr.org/archives/213 — krsteeve, Oct 04 '18 at 19:59
@Dmitry, not really a dupe - this question understands the specified behaviour, but wants to understand the *motivation* for not defining out-of-range shifts. — Toby Speight, Oct 05 '18 at 07:32

score 33 · Accepted Answer · answered Oct 04 '18 at 12:26

33

One of the goals of C++ is to allow for fast, efficient code, "close to the hardware". And on most hardware, an integer right shift or left shift can be implemented by a single opcode. The trouble is, different CPUs have different behavior in this case where the shift magnitude is more than the number of bits.

So if C++ mandated a particular behavior for shift operations, when producing code for a CPU whose opcode behavior doesn't match all the Standard requirements, compilers would need to insert checks and logic to make sure the result is as defined by the Standard in all cases. This would need to happen to almost all uses of the built-in shift operators, unless an optimizer can prove the corner case won't actually happen. The added checks and logic would potentially slow down the program.

answered Oct 04 '18 at 12:26

aschepler

70,891
9
107
161

6

This doesn't explain why it isn't implementation-defined behavior, rather than completely undefined. That would also allow for "close to the hardware" implementations. – Ruslan Oct 04 '18 at 16:22
1

Implementation-defined would mean that compilers handle it. If we're just passing stuff to the CPU it's not implementation defined. The makers of the compilers have no idea what happens. You could just check what a specific CPU does with the instructions and make a list of what happens for what CPU. Still it wouldn't be implementation-defined, it would be environment-defined. – xyious Oct 04 '18 at 18:51
@Ruslan Implementation defined behavior lets the compiler choose from a set of options. For example, if the value of such a shift was implementation defined, a CPU would be forbidden from trapping it as an error. – Cort Ammon Oct 04 '18 at 19:01
1

@CortAmmon right, if _value_ were implementation-defined. But what about behavior? It could also be implementation-defined, and then trapping would be allowed, but would be required to be documented and remain consistent, unlike UB which can lead to logical paradoxes or whatnot. – Ruslan Oct 04 '18 at 19:03
@Ruslan I suppose if the C++ spec were to list every possible behavior that a CPU could choose from, then that would be sufficient. – Cort Ammon Oct 04 '18 at 19:05
@CortAmmon but if the spec did that, I wouldn't be able to run C++ on my new superconducting CPU that works using base 23 arithmetic, because the spec writers never imagined such a thing would be built ;) (That's a humorous example, but the point I'm making is deadly serious). – alephzero Oct 04 '18 at 21:50
1

@xyious, compilers already have to be conscious of what CPUs do. – Paul Draper Oct 04 '18 at 22:43
@PaulDraper what ? So you're saying I can't run a program compiled for my i7 on an AMD chip ? I can't run a program compiled on a 486 on my i7 ? My compiler does not know my CPU and I don't see why or how we could force it to. – xyious Oct 05 '18 at 16:49
@xyious In those cases, running one executable on multiple chips is possible because the executable code includes only opcodes in a common subset supported by all the chips. And those opcodes do have very strictly defined behavior, and if a chip manufacturer claims to support them but their chip has different behavior, that chip is considered buggy. So the compiler doesn't necessarily need to know what the CPU *is*, but it can rely on what the CPU *does*. – aschepler Oct 05 '18 at 22:18
@xyious, yes exactly! An x86 executable does necessarily not work on ARM or vice versa. (Though in that specific example, it is possible to create executables that work on both.) And compilers can get even more specific than those general ISAs; see the `-march` and `-mtune` flags of gcc. – Paul Draper Oct 06 '18 at 05:29
In this specific example, to make it implementation defined would mean that the compiler somehow puts something into the code instructing the CPU how to handle a shift. How would that work ? Are you going to make AMD and Intel implement an opcode so you can tell them how exactly shifting works ? – xyious Oct 08 '18 at 16:12
@xyious No new or changed opcode semantics would be necessary. In pseudocode, we could have the compiler convert `val = a >> b;` into: `if (b > -32 && b < 32) val = __native_rightshift_opcode(a, b); else val = 0;` – aschepler Oct 08 '18 at 22:47
I would question the use of that. What you posted should be in a library, not the compiler. The compiler is there to compile, not completely change the meaning of code. I will agree that it should be in a library, considering there's a use for it and the alternative is undefined. – xyious Oct 09 '18 at 15:21
@xyious Of course it's questionable, at least in C++. (In some other languages, it's infrequently or never the case that even a "plain" operator acting on "built-in" types will map to just one CPU opcode.) But the point here is that if the Standard did specify a result of zero in those cases, then that sort of ugly invisible transformation is the only way an implementation could be correct! – aschepler Oct 09 '18 at 21:17

score 1 · Answer 2 · answered Oct 05 '18 at 08:15

To give a specific example, x86 trims the shift count to 5 bits (6 bits for 64-bit shifts), while ARM trims the shift count to 8 bits. With current C++ standard, compilers for both CPUs can implement shifts with a single opcode.

If the C++ standard were to define the outcome of shifts by more than the operand length in a particular way, compilers targeting at least one of the CPU families (and possibly both, if the outcome required by C++ wouldn't match either hardware implementation, like the behaviour you suggest) would have to implement each shift operation using branches which would produce the required result where the CPU opcode wouldn't.

[The 8086 does not mask the shift count. However, all other IA-32 processors (starting with the Intel 286 processor) do mask the shift count to 5 bits, resulting in a maximum count of 31. This masking is done in all operating modes (including the virtual-8086 mode) to reduce the maximum execution time of the instructions.](http://www.felixcloutier.com/x86/SAL:SAR:SHL:SHR.html) — phuclv, Oct 18 '18 at 02:42

Undefined behavior of right-shift in C++

2 Answers2

Linked

Related