2

Simple code snippet:

#define FOO 7
int bar = -875;
bar <<= FOO;

This is being reported by UBSAN as UB.

My understanding is that -875 << 7 is just -(875<<7) and there is no overflow.

So, is there a real problem here?

Jacko
  • 12,665
  • 18
  • 75
  • 126

2 Answers2

7

Your understanding is incorrect.

Firstly you used bar <<= FOO syntax. This explicitly shifts bar and bar is negative. Left-shifting of negative values produces undefined behavior in C. There's no way bar <<= FOO can be interpreted as -(875<<7).

Secondly, concerning -875 << 7 in terms of operator priority: unary operators always have higher priority than binary ones, which means that -875 << 7 is (-875) << 7 and not -(875 << 7). And, again, left-shifting of negative values produces undefined behavior in C.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Thanks, @AnT. In my case, I am converting to 7 bit fixed point when I perform the shift. So, what is the correct way of converting negative numbers to fixed point? – Jacko Apr 20 '16 at 17:59
  • 1
    @Jacko: Multiply it instead of shifting? Multiply it by 128. The compiler will still be able to generate a shift for that (if that's indeed the most efficient way to do it), but you will not have to go through UB. – AnT stands with Russia Apr 20 '16 at 18:02
  • 1
    From C11 standard: 6.5.7 Bitwise shift operators [...] The result of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are filled with zeros. If E1 has an unsigned type, the value of the result is E1 × 2^E2, reduced modulo one more than the maximum value representable in the result type. If E1 has a signed type and nonnegative value, and E1 × 2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is undefined. – njuffa Apr 20 '16 at 18:31
3

On a sign-magnitude machine, it is unclear what the effect of left-shifting a negative number should be, and it would not be unreasonable for such a machine to trap if such an operation was attempted. On such a machine, imposing any requirements on the behavior of a negative integer left-shift would likely have required compilers for such machines to generate extra code even in cases where the values to be shifted would always be positive. To avoid imposing such costs, the authors of the Standard declined to mandate any particular behavior.

Ones'-complement and two's-complement platforms would have no logical reason to trap when shifting a negative value (though whether -1<<1 should yield -2 or -3 on a ones'-complement machine would be unclear), but the authors of the Standard saw no reason to say that left-shifting of negative values has Undefined Behavior on platforms which use sign-magnitude integers, Implementation-Defined Behavior on platforms that use ones'-complement, and Standard-defined behavior on platforms which use two's-complement, since any two's-complement implementation would regard -1<<1 as yielding -2, whether or not the Standard mandated it, unless the author was deliberately obtuse.

Until probably 2005 or so, there would have been nothing even imaginably unsafe about code which will only be called upon to run on commonplace two's-complement machines using the left-shift operator on a negative value. Unfortunately, around that time an idea started to take hold which suggested that a compiler which avoids doing anything the Standard doesn't mandate can be more "efficient" than one which behaves usefully in situations not mandated by the Standard, and that such "efficiency" is desirable. I am not yet aware of compilers regarding the statement y=x<<1; as retroactively making the value of x non-negative, but I do not believe there is any reason to believe that they won't do so in future so unless or until some agency officially codifies the behavioral guarantees which mainstream microcomputer C compilers unanimously upheld for 25+ years such code cannot be considered "safe".

supercat
  • 77,689
  • 9
  • 166
  • 211
  • Thanks, @supercat. I am learning to treat shift operators with more respect :) Fast and potentially deadly. – Jacko Apr 20 '16 at 19:00
  • 1
    @Jacko: Hopefully there will someday be a recognized split between "sane C", where shift registers will be free of side-effects, and the presently-fashionable "Obtuse C", which goes out of its way to avoid letting programmers do anything useful beyond what the Standard requires. – supercat Apr 20 '16 at 19:27
  • With regard to the final paragraph, the following might be useful follow-up reading: Wang, Xi, et al. "A differential approach to undefined behavior detection." Communications of the ACM 59.3 (2016): 99-106. ([online](https://people.csail.mit.edu/nickolai/papers/wang-stack-cacm.pdf)) – njuffa Apr 20 '16 at 20:27
  • What happened in 2005 to start the "undefined :=> unexecuted" trend? I have only recently become aware of it, and it hurts my brain. I feel like an unwritten contract between programmer and compiler is being broken. – AShelly Apr 20 '16 at 20:49
  • @AShelly: I'm not sure what precipitated it, but I haven't found any evidence of the trend prior to that. I haven't found any evidence that far back than that since I started looking, but I have enough of a recollection of there having been something in 2005 that I can't claim to be unaware of anything before that. – supercat Apr 20 '16 at 20:52
  • @AShelly: As for the contract, it's simple: the C89 Standard was written with the intention that it should form the common core for a variety of machine-specific dialects that would be extended to fit different platforms' unique abilities. The authors of the rationale noted, for example, that having "unsigned char" promote to "signed int" would not prevent a majority of then-current platforms from yielding deterministic arithmetically-correct behavior if "unsigned x=uchar1*int1;" yielded a result between INT_MAX+1u and UINT_MAX, even though such code might fail on other platforms. – supercat Apr 20 '16 at 20:56
  • @AShelly: It was widely seen as important that compilers not only comply with the Standard, but also that they abide by behavioral guarantees that offered by compilers for similar platforms. If 100.0% of compilers would honor a behavioral guarantee except when invoked in hyper-pedantic mode, nobody had reason to think it mattered if the Standard actually required it. – supercat Apr 20 '16 at 20:59
  • @njuffa: Looking through the bibliography of that paper, it seems the trend in question is mostly fairly recent. The 2007 reference I saw "(n+100 > n)" falls into a class of optimization I would regard as helpful, which is to allow compilers to treat integers with extra precision that may appear or disappear at the compiler's leisure--something very different from hyper-modernism. – supercat Apr 20 '16 at 21:19
  • @supercat I became aware of the trend around 2006 or 2007, when I was bitten by gcc's introduction of new optimizations exploiting undefined behavior, which "broke" code I had used without issues for more than a decade prior to that, across three platforms and five toolchains. BTW, a more detailed version of the paper in CACM by the same authors is: Wang, Xi, et al. "A differential approach to undefined behavior detection." ACM Transactions on Computer Systems (TOCS) 33.1 (2015): 1. – njuffa Apr 20 '16 at 21:44
  • 1
    @njuffa: What irks me is the attitude of gcc's maintainers that code which relies upon behavior which isn't defined by the Standard but had been treated 100% consistently by every modern compiler, should be considered "broken", especially when fixing the code would effectively block what would otherwise be useful optimizations. If C is to survive as a useful language, it needs to add normative specifications such that programs could say what corner-case behaviors they require, and compilers could then either accept the programs (and honor the requirements) or reject the programs. – supercat Apr 20 '16 at 22:01
  • @njuffa: If the authors of a compiler for some platform judge that nobody writing code for the platform would care about some guarantee, there would be no need for them to burden the compiler with it, but on the flip side code which would benefit from a guarantee that just about any platform should be able to offer cheaply shouldn't have to refrain from using it just because some obscure platform can't. – supercat Apr 20 '16 at 22:03