2

I have following piece of code:

typedef struct {
        int x;
        int y;
        int z;
        int w;
} s32x4;

s32x4
f() {
        s32x4 v;
        v.x = 0

        return v;
}

which generates (gcc -O2):

f:
        xor     eax, eax
        xor     edx, edx          ; this line is questionable
        ret

where clang outputs (clang -O2):

f:                                      # @f
        xor     eax, eax
        ret

Questions

  • Is there a reason why GCC inserts an XOR there?
  • If there isn't a good reason for it: can I somehow get rid of it?

Note

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • Its the way to set eax to 0, if you xor a number by itself. – Antonin GAVREL Apr 07 '21 at 05:54
  • 1
    @AntoninGAVREL I know that, but why does it set the `EDX` and the `EAX`. I need only `EAX` here if I'm not wrong. –  Apr 07 '21 at 05:55
  • 2
    Have you tried with O3 as well,? – Antonin GAVREL Apr 07 '21 at 05:58
  • @AntoninGAVREL yes, you have the GODBOLT link under NOTES section. In case you want to play with that. –  Apr 07 '21 at 05:59
  • 8
    GCC is using the pair `rdx:rax` to return the struct, since the struct is exactly 16 bytes large. `xor eax,eax` clears the entire `rax` (likewise for `edx`/`rdx`). – Michael Apr 07 '21 at 06:01
  • @Michael Thank you for the answer. Random question: what kind of advantage does it give to us? Or maybe rephrase it in this way: why doesn't clang do that? –  Apr 07 '21 at 06:03
  • 3
    There's some question as to whether it's UB to return a `struct` with some members uninitialized, see https://stackoverflow.com/questions/47433041/is-using-a-structure-without-all-members-assigned-undefined – Nate Eldredge Apr 07 '21 at 06:06
  • Try compiling with `gcc -Wall -Wextra -O3 -fverbose-asm -S hrant.c` then look inside the generated `hrant.s`. You can also use [GCC developer options](https://gcc.gnu.org/onlinedocs/gcc/Developer-Options.html) or write your own [GCC plugin](https://gcc.gnu.org/onlinedocs/gccint/Plugins.html) or use [Bismon](https://github.com/bstarynk/bismon). Contact me by email to `basile.starynkevitch@cea.fr` about it – Basile Starynkevitch Apr 07 '21 at 06:08
  • 2
    @Michael: Well, they're both using `rdx:rax`. But the members that would go in `rdx` have not been initialized, so clang leaves `rdx` uninitialized likewise. The funny thing is that GCC chooses to zero `rdx` instead. Possibly just in the interests of making things a little more deterministic? – Nate Eldredge Apr 07 '21 at 06:09
  • Interesting, thank you all for the answers. –  Apr 07 '21 at 06:13
  • @NateEldredge: Some tasks would be best served by a diagnostic implementation that squawks if code returns a structure without fully initializing it, but others would be better served by an implementation that allows writes to structure members whose values will be ignored by downstream code to be omitted even if the compiler can't show that downstream code would never use them. The Standard allows implementations to behave in either of those useful fashions, and makes no attempt to forbid other silly behaviors, since compiler writers should be expected not to be gratuitously silly. – supercat Apr 07 '21 at 15:22

2 Answers2

9

You read a partly uninitialized struct object to return it, which is (arguably) Undefined Behaviour on the spot, even if the caller doesn't use the return value.

The 16-byte struct is returned in RDX:RAX in the x86-64 System V ABI (any larger and it would be returned by having the caller pass a pointer to a return-value object). GCC is zeroing the uninitialized parts, clang is leaving whatever garbage was there.

GCC loves to break dependencies any time there might be a risk of coupling a false dependency into something. (e.g. pxor xmm0,xmm0 before using the badly-designed cvtsi2sd xmm0, eax). Clang is more "aggressive" in leaving that out, sometimes even when there's only a tiny code-size benefit for doing so, e.g. using mov al, 1 instead of mov eax,1, or mov al, [rdi] instead of movzx eax, byte ptr [rdi])


The simplest form of what you're seeing is returning an uninitialized plain int,
same difference between GCC and clang code-gen:

int foo(){
    int x;
    return x;
}

(Godbolt)

# clang 11.0.1 -O2
foo:
        # leaving EAX unwritten
        ret


# GCC 10.2 -O2
foo:
        xor     eax, eax        # return 0
        ret

Here clang "gets to" leave out a whole instruction. Of course it's undefined behaviour (reading an uninitialized object), so the standard allows literally anything, including ud2 (guaranteed to raise an illegal instruction exception), or omitting even the ret on the assumption that this code-path is unreachable, i.e. the function will never be called. Or to return 0xdeadbeef, or call any other function, if you have a malicious DeathStation 9000 C implementation.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
1

The easiest way of handling some corner cases where the Standard defines the behavior of programs that uses the value of an uninitialized automatic variable is to initialize such values zero zero. A compiler that does that will avoid the need for any other corner-case handling.

Consider, for example, how something like:

#include <string.h>
extern unsigned short volatile vv;
int test(int a, int mode)
{
    unsigned short x,y;

    if (mode)
        x=vv;
    memcpy(&y,&x,sizeof x);
    return y;
}

should be processed on a platform which uses 32-bit registers to hold all automatic objects of all integer types 32 bits and smaller. If mode is zero, this function should copy two unspecified byte values into the bytes comprising y and return that, causing it to hold an arbitrary number in the range 0-65535. On ARM GCC 4.5.4, however, this function would use R0 register to hold x and y, without ever writing to that register in the 'mode==0' case. This would result in y behaving as though it holds whatever was passed as the first argument, even if that value was outside the range 0-65535. Later versions avoid this issue by pre-initializing R0 to zero (which is of course always in the range 0-65535).

I'm not sure if gcc's decision to zero things in the OP's example is a result of trying to preempt corner cases that might otherwise be problematic, but certainly some situations where it pre-zeroes things in cases not required by the Standard seem to stem from such a goal.

supercat
  • 77,689
  • 9
  • 166
  • 211
  • *copy two unspecified byte values into the bytes comprising y and return that* - ISO C says it's ok to read uninitialized values with `memcpy`? You're implying that, but I find it surprising that it's not undefined behaviour so IMO it would be good to state that 100% clearly. – Peter Cordes May 06 '21 at 17:44
  • 2
    @PeterCordes: The rule that would make reading uninitialized automatic variables as UB has an explicit exception for character-type reads of objects whose address is taken, and memcpy is specified as behaving as a sequence of character-type reads and writes in cases where the source and destination do not overlap. – supercat May 06 '21 at 17:56
  • Interesting. I'm somewhat curious why they decided to make that exception or why it's useful, without generally allowing any object to be read and getting a possibly undefined bit-pattern object-representation. Anyway, I suspect the GCC behaviour is more aimed at breaking any possible false dependencies (which GCC loves to do in general, but clang often avoids when there's no loop within the same function). But it does have an interesting benefit for this case. – Peter Cordes May 06 '21 at 18:00
  • 1
    @PeterCordes: The Committee seems prone to characterize actions as having Undefined Behavior on all implementations if there is any implementation where their behavior might be hard to specify, even if the actions would otherwise have unambiguously defined behavior on 99.9% of implementations. A prime example of that is left-shifting of negative values (whose behavior was unambiguously defined on C89 for two's-complement machines where integer types have no padding bits, but became UB in C99). The Committee also seems unable to recognize that it may sometimes be useful to have... – supercat May 06 '21 at 18:49
  • ...diagnostic implementations trap attempts to perform actions which are *often* erroneous, in cases where code would have no reason to perform them, but it may also be useful for implementations to allow deliberate use of such constructs when such diagnostics are not required. Further, there are many cases where it may be useful to partially specify aspects of behavior, but under a non-deterministic model which doesn't fit the Standard's Abstract Machine. – supercat May 06 '21 at 19:00
  • 1
    @PeterCordes: My understanding is that uninitialized objects have indeterminate values, and for most types an indeterminate value could be a trap representation, which it's UB to try to read. But character types don't have any trap representations, so reading an uninitialized character just yields an indeterminate but valid value. – Nate Eldredge May 06 '21 at 19:08
  • 1
    @PeterCordes: It's too bad that the Standard relies upon implementations to give descriptions of behavior priority over a statements that actions invokes UB *when there is no obvious or documented reason to do otherwise*, but doesn't acknowledge such reliance, and doesn't make clear that characterization of such action as invoking UB is intended to allow for situations where implementations might have a good reason for deviating from the described behavior, and is not intended to be treated as a "good reason" in and of itself. – supercat May 06 '21 at 19:10
  • @NateEldredge: `_Bool` doesn't have "trap representations" per-se, but it's still UB to read an uninitialized bool, because the compiler is allowed to assume that a _Bool object is either 0 or 1 (if that's part of the ABI the implementation follows). [Does the C++ standard allow for an uninitialized bool to crash a program?](https://stackoverflow.com/a/54125820). Maybe you could argue that's a trap representation, but it seems different to me because you can sometimes read it without trapping, depending on what you do. Or are you still just talking about memcpy and `unsigned char` here? – Peter Cordes May 06 '21 at 19:12
  • 1
    @NateEldredge: Although that would be a logical reason to treat character types differently, the Standard characterizes attempts to read Indetermine Value of types that *would be allowed to have trap represnetations* as UB, without regard for whether the types *do* have trap representations. – supercat May 06 '21 at 19:13
  • @PeterCordes: Why do you say _Bool doesn't have trap representations? Any bit pattern which does not have a meaning defined by its type is a trap representation, and _Bool has 254+ of them. – supercat May 06 '21 at 19:16
  • @supercat: I guess I'm fuzzy on the definition of "trap representation". I took it literally, as one where the machine actually *would* trap if you tried to do most things with it, like `~0` on some one's complement machine (where it apparently faults as input to instructions like signed-`add`, instead of working as integer negative-zero). But on x86, a `0xa1` bit-pattern would test as `true` in most cases. Your definition makes more sense, though, simply any bit-pattern that doesn't mean anything, and which compilers can assume isn't present. – Peter Cordes May 06 '21 at 19:18
  • @PeterCordes: On some platforms, trap representations are a necessary evil, and on some kinds of diagnostic implementation they can be useful if behavior is specified as trapping, but otherwise a good dialect should seek to minimize the number of actions which may have out-of-band side effects. If a program needs to process a mixture of valid and invalid data, to produce a corresponding mixture of results that will or will not matter, being able to process all of the data without regard for validity can often be faster than having to avoid processing invalid data, but... – supercat May 06 '21 at 19:33
  • ...such performance advantages will go out the window if code has to guard against the possibility that computations on invalid data might arbitrarily disrupt other aspects of program behavior. – supercat May 06 '21 at 19:36
  • 1
    @PeterCordes: BTW, if I had been writing the C Standard, I would have specified that an implementation may raise an Implementation-Defined signal if an attempt is made to read or write a bool with a value other than 0 or 1, but otherwise any such attempt may at its leisure substitute any odd number for any odd number other than 1, and any integer (odd or even) for any even integer other than zero. This would be compatible with how many existing implementations already processed "bit" types, and would in many cases have allowed more efficient code generation than is possible with _Bool. – supercat May 07 '21 at 14:51