0

Compiling with gcc on x86-64, I have a function with the signature

void * g(void* p, unsigned char a, unsigned char b, ...)

and a function call

long double zero = 0;
g(NULL, 0, 0, zero);

But when g() reads the zero argument via va_arg(...,long double) and compares it == 0, the result is FALSE!!!

I was experimenting with the assembly output (I don't know the x86-64 calling conventions well), but the output of the function call is

fldz
fstpt   -16(%rbp)
pushq   -8(%rbp)
pushq   -16(%rbp)
movl    $0, %edx
movl    $0, %esi
movl    $0, %edi
movl    $0, %eax
call    g

and this does not work as I have written above. But when I add an instruction like this

...    
fstpt   -16(%rbp)
movw    $0, -6(%rbp) #added instruction
pushq   -8(%rbp)
...

it starts to do the right thing. But how can this even have an influence if the long double type and what is handled by the FPU, has only 10 bytes. As I understand, the long double argument is passed via [RBP-16],...,[RBP-7], but what is changed among the working and the not working versions is only [RBP-6] and [RBP-5].

Maybe anyone has some ideas? Thank you!

EDIT: This is the C code of g:

void * g(void* p, unsigned char a, unsigned char b, ...) {
    va_list list;
    long double x;
    va_start(list, b);
    x = va_arg(list, long double);
    if (x == 0)
        h(q,r);
...

and the respective assembler code:

g:
pushq   %rbp
movq    %rsp, %rbp
subq    $240, %rsp
movq    %rdi, -232(%rbp)
movq    %rcx, -152(%rbp)
movq    %r8, -144(%rbp)
movq    %r9, -136(%rbp)
testb   %al, %al
je      .L1635
movaps  %xmm0, -128(%rbp)
movaps  %xmm1, -112(%rbp)
movaps  %xmm2, -96(%rbp)
movaps  %xmm3, -80(%rbp)
movaps  %xmm4, -64(%rbp)
movaps  %xmm5, -48(%rbp)
movaps  %xmm6, -32(%rbp)
movaps  %xmm7, -16(%rbp)
.L1635:
movl    %esi, %eax
movb    %al, -236(%rbp)
movl    %edx, %eax
movb    %al, -240(%rbp)
movq    %fs:40, %rax
movq    %rax, -184(%rbp)
xorl    %eax, %eax
movl    $24, -208(%rbp)
movl    $48, -204(%rbp)
leaq    16(%rbp), %rax
movq    %rax, -200(%rbp)
leaq    -176(%rbp), %rax
movq    %rax, -192(%rbp)
movq    -200(%rbp), %rax
addq    $15, %rax
andq    $-16, %rax
leaq    16(%rax), %rdx
movq    %rdx, -200(%rbp)
fldt    (%rax)
fstpt   -224(%rbp)
fldt    -224(%rbp)
fldz
fucomip %st(1), %st
fstp    %st(0)
jp      .L1636
fldt    -224(%rbp)
fldz
fucomip %st(1), %st
fstp    %st(0)
jne     .L1636
leaq    .LC22(%rip), %rsi
leaq    .LC23(%rip), %rdi
call    h
.L1636:
...
Kolodez
  • 553
  • 2
  • 9
  • It would be a lot simpler to `sub $16, %rsp` ; `fstpt (%rsp)` instead of storing somewhere else and then copying with `push`. This is what compilers do. Or for `0.0`, gcc uses `pushq $0` twice https://godbolt.org/g/fYj4oS. Also, if you prefer Intel syntax (`[rbp-16] .. [rbp-7]`), why not use that instead of AT&T? But anyway, we can see from clang's and ICC's code-gen that leaving the high 6 bytes of padding unmodified is legal. Is it possible that a misaligned stack is breaking `g()`? I think you need to debug `g`, because your caller looks correct. – Peter Cordes Jul 24 '18 at 12:57
  • I added the code of ```g``` up there. If I compile it on x86-32, there are no problems. However, doesn't ```g``` also look correct? What exactly would happen in case of a misaligned stack? And I really don't get why ```g``` seem to test ```x==0``` twice and why the instructions ```jp``` and ```jne``` would depend on the FPU flags... I agree that the compiler could optimize much here and that I could use Intel syntax. :) – Kolodez Jul 26 '18 at 23:23
  • So is `g` purely compiler-generated? `jp` tests for unordered (if `x` is NaN), `jne` tests for != 0.0 after it knows the comparison is ordered. If you compile `g` with optimization enabled, it would be easier to read (less store/reload noise). – Peter Cordes Jul 26 '18 at 23:32
  • What happens if you write the caller in C? Does it work then? If so, maybe single-step through `g` with a debugger (`stepi` in GDB to step by instruction), and see where things go wrong. – Peter Cordes Jul 27 '18 at 00:25

0 Answers0