0

I am working on a program using GNU Scientific Library. It gives an "illegal instruction (core dump)" after solving a nonlinear equation in half. See below. What is the usual cause of such "illegal instruction" error, and how could I debug in such situations?

...
iter  0: A = 1.0000, lambda = 1.0000, b = 0.0000, cond(J) =   6.0000, |f(x)| = 101.0200
iter  1: A = 3.5110, lambda = -12.8820, b = 1.2364, cond(J) =  92.8216, |f(x)| = -nan
iter  2: A = 3.5110, lambda = -12.8820, b = 1.2364, cond(J) =      nan, |f(x)| = -nan
iter  3: A = 3.5110, lambda = -12.8820, b = 1.2364, cond(J) =      nan, |f(x)| = -nan
iter  4: A = 3.5110, lambda = -12.8820, b = 1.2364, cond(J) =      nan, |f(x)| = -nan
Illegal instruction (core dumped)

With gdb, I got a bit of additional info.

Program received signal SIGILL, Illegal instruction.
0x00000000004d1030 in nielsen_reject (nu=<optimized out>, mu=<optimized out>) at nielsen.c:98
98    *nu <<= 1;
(gdb) p nu
$1 = <optimized out>
(gdb) x/i $pc
 => 0x4d1030 <trust_iterate+8912>:  ud2 

Above, nielsen.c98 looks like this

...
static int
nielsen_reject(double * mu, long * nu)
{
  *mu *= (double) *nu;

  /* nu := 2*nu */
  *nu <<= 1;

  return GSL_SUCCESS;
}

CPU is x86_64 according to uname -m. OS is Ubuntu 16.04 VirtualMachine on a Mac(host). GCC version is 5.4.

gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

The code is compiled with gcc, with Address Sanitizer and Undefined Sanitizer switched on (-fsanitize=address,undefined).

It seems that the compiler produces a "ud2" instruction when sanitizers are switched on. I would like to know whether this bug is a compiler bug, sanitizer bug, or code bug?

zell
  • 9,830
  • 10
  • 62
  • 115
  • One possible cause is that you're executing data, e.g. by misuse of function pointers. Another is that either your code or the library is incorrectly compiled or configured for your CPU, and is using instructions that it doesn't support. Numerical code like gsl is likely to use SIMD extensions like SSE, AVX, etc that require a sufficiently recent CPU. – Nate Eldredge Dec 04 '20 at 16:13
  • One way to check is to run the code under a debugger like gdb, and when it faults, use `x/i $pc` to disassemble the faulting instruction. Check a CPU architecture manual for what feature set that instruction requires, and then see whether your CPU has that feature set (with its manual or /proc/cpuinfo or whatever). – Nate Eldredge Dec 04 '20 at 16:16
  • @NateEldredge Thank you. I have updated the question with more information. In particular, it seems that the compiler produces a "ud2" instruction when sanitizers are switched on. I would like to know whether this is a compiler bug, sanitizer bug, or code bug? – zell Dec 05 '20 at 11:39
  • "Code bug" is usually the smart guess. The compiler would insert ud2 instructions when some check for undefined behavior has failed, or some code path should never be reached. Given the faulting code, my first guess would be that `*nu` is overflowing. You may find this easier to debug if you recompile without optimizations. – Nate Eldredge Dec 05 '20 at 15:54
  • You're also using fairly old software. Testing with newer versions could help in two ways: (1) it could be a bug in GSL that has since been fixed; (2) the sanitizer in a newer compiler may be more helpful and less likely to have bugs of its own. – Nate Eldredge Dec 05 '20 at 15:55

0 Answers0