2

I'd like to find out where exactly an application written in C/C++ fails. I can not debug the application directly, neither using gdb / lldb or using an IDE because the application is launched by a program (it is a robot controller for the webots robot simulation software). In the OSX console I can find a 'User Diagnostic Report' which even shows a strack trace at the moment of crashing. I just need to find out where exactly in my source code the crash happens, but I don't understand the following stack trace syntax:

Exception Type:        EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:       EXC_I386_GPFLT

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_c.dylib               0x00007fff92d6b859 strtol_l + 77
1   controller_2                    0x0000000100006b57 main + 4839
2   controller_2                    0x00000001000010b4 start + 52

Apparently somewhere (+4839) in my int main() {} function something eventually calls strtol_l (must be indirect because there is no appearance of this function call in the controller code) which causes the crash.

What does the + 4839 stand for? is it a memory block offset? It can not be a source code line number as the source code for the controller is ~1200 lines only and the controller is not compiled with debug info.

Michahell
  • 4,905
  • 5
  • 29
  • 45
  • I would say that the `+ 4839` is saying that it called `strtol_l` in the 4839th processor instruction after it entered `main`. If you disassemble the binary you ran you should be able to find what instruction that is and maybe find a reference to a familiar line of code in the surrounding assembly. – nonsensickle May 29 '15 at 00:47
  • 1
    It's a byte count, the offset from the beginning of `main`. – user3386109 May 29 '15 at 00:49
  • if you recompile the controller code, with '-ggdb' (which will make the executable significantly larger) then let it run until it seg faults, then the stack trace will include line numbers, etc etc – user3629249 May 29 '15 at 01:06
  • when the controller software seg faults, it outputs a 'core' file. use that core file as the input to gdb. (have all the source code visible to gdb and have the original executable generated using (at least '-g' and preferably '-ggdb' Then all the details of the backtrace (when next it seg faults) will be visible, with function call names, line numbers, etc. Note: since you probably did not compile the libraries from source, the details within the library functions will still be rather skimpy, However, the root of the problem, in your main function, will be very visible – user3629249 May 29 '15 at 01:13
  • The answer to this question might be of interest to you: http://stackoverflow.com/questions/16227845/what-to-make-of-an-impossible-stack-trace-after-a-crash – Jeremy Friesner May 29 '15 at 04:29
  • Great, thanks for the comments, I will try to compile with -ggdb and let the sims run until this happens, then find out what was the cause, if the OSX diagnostic report includes line numbers and function call names that is. If i can find the problem in my main function that's fine, I don't think there's a bug in ```libsystem_c.dylib``` at least I don't hope there is.. – Michahell May 31 '15 at 14:22
  • Compiling with -g or even -ggdb worked, but my program hasn't crashed since when I run it. Using ```otool -tv``` on the compiled object file gives me assembler like gibberish and I can't read any usefull function signatures: http://pastebin.com/33uACurn running ```otool -tv``` on the .out file also gives the same kind of gibberish, but now with some more function like signatures. Where does the 'core' file end up? I tried looking for it but don't know where to look. How else beside using ```otool``` can I disassemble the binary? – Michahell Jun 24 '15 at 22:14

1 Answers1

1

You can debug your robot controller process in gdb by using the gdb attach command with the PID of the robot controller process you want to debug. This will allow gdb to attach the process on the fly and debug it as if it was originally launched from gdb. This is well explained in the Webots documentation here: http://www.cyberbotics.com/dvd/common/doc/webots/guide/section5.5.html

Olivier Michel
  • 755
  • 6
  • 10
  • Okay, well yes you can indeed, but in my case this is quite hard / unrealistic to do I think. I'm running 150 * 10 simulations and the crashes happen at random times (indeterministic). I'd have to write a script that finds the PID's of all 2 * 14 controllers, attaches all those PID's to GDB for each of all 1500 simulations. and after each simulation, exit GDB I assume. I don't think this is the easiest way to figure out what the problem is :/ – Michahell May 31 '15 at 14:19
  • Then, maybe you can try to set gdb as the controller program of your robots and set in the controllerArg field the name of the actual controller you want to debug, plus with -x option with the "run" command so that gdb will start running the controller immediately. However, I am not sure that will happen when one of your controllers will crash. I guess, you should start webots from the console to see the output of gdb there. Let us know if that is a workable solution. – Olivier Michel Jun 01 '15 at 14:54
  • Hmm, I think (was not sure before) that the segfault crashes the MacPro entirely. It crashed earlier (I use TeamViewer to remotely monitor the MacPro running simulations, and at some point the screen blacks out and the connection is lost. I then have to physically reboot the MacPro.) Since this just happened, I hope someone can reboot the MacPro for me tomorrow or else after work I will know if this is possible to do. I did add -ggdb as a compilation flag, so I'm curious If i can see more detailed stack trace information in a User Diagnostic Report now. Will update once I know more! – Michahell Jun 03 '15 at 21:45
  • Setting gbd as the controller program did not work. having a /controllers/gdb/gdb folder structure where gdb is a symlink to /usr/local/bin/gdb didn't work. I tried the same for lldb, didn't work. This would be the best option though. I read that gdb and lldb both support debugging multiple threads/processes so it *should* be possible.. I am thinking of writing a new controller that does a systemcall to gdb to run the real controller. will that work? – Michahell Jun 24 '15 at 21:57
  • I don't see it wouldn't work. It is clearly worth trying... Keep us posted. – Olivier Michel Jun 26 '15 at 06:15