SIGSEGV in optimized version of code

Question

My knowledge of the intel instruction set is a bit rusty. Can you tell me why I might be getting a segmentation fault in the optimized version of my function (bonus points if you can tell me why I don't get it in the -O0 build of the code.

It's C code compiled by GCC 4.1.2.

Here is the result of GDB's "disas" command at the crash:

   0x00000000004263e5 <+0>:     sub    $0x8,%rsp
   0x00000000004263e9 <+4>:     movsd  %xmm2,(%rsp)
   0x00000000004263ee <+9>:     divsd  %xmm1,%xmm0
   0x00000000004263f2 <+13>:    callq  0x60f098 <log@plt>
=> 0x00000000004263f7 <+18>:    andpd  0x169529(%rip),%xmm0        
   0x00000000004263ff <+26>:    movsd  (%rsp),%xmm1
   0x0000000000426404 <+31>:    ucomisd %xmm0,%xmm1
   0x0000000000426408 <+35>:    seta   %al
   0x000000000042640b <+38>:    movzbl %al,%eax
   0x000000000042640e <+41>:    add    $0x8,%rsp
   0x0000000000426412 <+45>:    retq

And here's the original source of the function:

char is_within_range(double a, double b, double range) {
  double ratio = a / b;
  double logRatio = fabs(log(ratio));
  return logRatio < range;
}

For reference here's the non-optimized version of the code:

   0x00000000004263e5 <+0>: push   %rbp
   0x00000000004263e6 <+1>: mov    %rsp,%rbp
   0x00000000004263e9 <+4>: sub    $0x30,%rsp
   0x00000000004263ed <+8>: movsd  %xmm0,-0x18(%rbp)
   0x00000000004263f2 <+13>:    movsd  %xmm1,-0x20(%rbp)
   0x00000000004263f7 <+18>:    movsd  %xmm2,-0x28(%rbp)
   0x00000000004263fc <+23>:    movsd  -0x18(%rbp),%xmm0
   0x0000000000426401 <+28>:    divsd  -0x20(%rbp),%xmm0
   0x0000000000426406 <+33>:    movsd  %xmm0,-0x10(%rbp)
   0x000000000042640b <+38>:    mov    -0x10(%rbp),%rax
   0x000000000042640f <+42>:    mov    %rax,-0x30(%rbp)
   0x0000000000426413 <+46>:    movsd  -0x30(%rbp),%xmm0
   0x0000000000426418 <+51>:    callq  0x610608 <log@plt>
   0x000000000042641d <+56>:    movapd %xmm0,%xmm1
   0x0000000000426421 <+60>:    movsd  0x16b6b7(%rip),%xmm0
   0x0000000000426429 <+68>:    andpd  %xmm1,%xmm0
   0x000000000042642d <+72>:    movsd  %xmm0,-0x8(%rbp)
   0x0000000000426432 <+77>:    movsd  -0x8(%rbp),%xmm1
   0x0000000000426437 <+82>:    movsd  -0x28(%rbp),%xmm0
   0x000000000042643c <+87>:    ucomisd %xmm1,%xmm0
   0x0000000000426440 <+91>:    seta   %al
   0x0000000000426443 <+94>:    movzbl %al,%eax
   0x0000000000426446 <+97>:    leaveq 
   0x0000000000426447 <+98>:    retq

Have you checked the differences with the non-optimized code (assembly output)? If so, can you post it as well? — Macmade, Nov 28 '11 at 22:21
That version of GCC is > 4.5 years old. This could be an optimization bug in the old version of GCC. — Michael Hoffman, Nov 28 '11 at 22:28
Why would that instruction cause a seg fault in the first place? I assume it's something to do with the million+ addressing modes, but maybe someone could weight in? — laslowh, Nov 28 '11 at 22:33
Output from `gcc -S` would probably be more helpful than the output of a Gdb disassembly. — Michael Hoffman, Nov 28 '11 at 22:33
Presumably you don't have access to `0x169529(%rip)`. I've compiled this code with both `-O0` and `-O3` in GCC 4.5.2, and both optimization levels use `andpd %xmm1, %xmm0`. — Michael Hoffman, Nov 28 '11 at 22:38
Have you tried running through valgrind. The error may be there in both versions but the optimization just makes it happen more often. — Paul Rubel, Nov 28 '11 at 22:41

score 6 · Accepted Answer · answered Nov 29 '11 at 02:00

=> 0x00000000004263f7 <+18>:    andpd  0x169529(%rip),%xmm0        
   0x00000000004263ff <+26>:    movsd  (%rsp),%xmm1

When the andpd instruction takes a memory operand, it's required to be aligned to a 16-byte boundary.

For %rip-relative addressing, the offset is applied to the address of the following instruction. So, here, the memory operand is at 0x4263ff + 0x169529 = 0x58f928, which is not 16-byte aligned. Hence the segfault.

The compiler is directly generating code for fabs(), using an AND with an appropriate bit mask to clear the sign bit; the bit mask constant value should have been placed at an appropriate offset in a sufficiently aligned data section, but hasn't been. This could be a bug in that (old) version of GCC, or could conceivably be a linker-related issue somewhere else.

The follow up to this, is that your answer is spot on, it turned out to be a linker bug in a non-standard linker that we're using. Thanks for the answer. — laslowh, Feb 24 '12 at 17:27

Macmade · Answer 2 · 2011-11-28T22:32:32.103

It seems to crash after the call to the log function:

callq  0x60f098 <log@plt>

So there's maybe a problem with the fabs implementation, using -O0.

Have you tried:

double logRatio = log(ratio);
logRatio = fabs(logRatio);

This may generate a different assembly output, and you may get additional infos about the crash.

As an alternative, you may replace the fabs call with something like:

double logRatio = log(ratio);
logRatio = (logRatio < 0) -logRatio : logRatio;

You may have precision issues with that, but that's not the point here...

score 1 · Answer 3 · answered Nov 28 '11 at 23:16

I'm also using gcc (GCC) 4.1.2 20070115 (SUSE Linux), here's the generated assembly:

Dump of assembler code for function is_within_range:
0x0000000000400580 <is_within_range+0>: divsd  %xmm1,%xmm0
0x0000000000400584 <is_within_range+4>: sub    $0x8,%rsp
0x0000000000400588 <is_within_range+8>: movsd  %xmm2,(%rsp)
0x000000000040058d <is_within_range+13>:        callq  0x400498 <log@plt>
0x0000000000400592 <is_within_range+18>:        andpd  358(%rip),%xmm0        # 0x400700
0x000000000040059a <is_within_range+26>:        xor    %eax,%eax
0x000000000040059c <is_within_range+28>:        movsd  (%rsp),%xmm1
0x00000000004005a1 <is_within_range+33>:        ucomisd %xmm0,%xmm1
0x00000000004005a5 <is_within_range+37>:        seta   %al
0x00000000004005a8 <is_within_range+40>:        add    $0x8,%rsp
0x00000000004005ac <is_within_range+44>:        retq

It appears to be almost the same, but I do not get a crash. I think you'll need to provide us with your compiler flags, and details of your processor and GLIBC version, and the values of a, b, and range that crash for you, as the issue is almost definitely with the log call.

SIGSEGV in optimized version of code

3 Answers3