Why gdb goes back and forth when debugging MPI program?

Question

By any chance, has anyone experienced the following situation?

For debugging purpose, I used only one mpi process by launching the mpi-based program with "mpi-run -np 1". However, when I debugged the program, repeatetive step-in and step-over happens quite often.

So, let's say, I followed the source code line by line until I've reached a point of interest. Then I tried to step-in, and type "n" expecting to proceed one line. However, the debugger goes back to the first line of the function. Only after I've experience this twice or three times, I can proceed.

Impression is that the debugger is not doing something wrong, since the result is thought to be correct. I am really curious about the reason as to why it's happening.

Thanks in advance!

[This might be relevant to your question.](http://stackoverflow.com/questions/20992356/gdb-jumps-to-wrong-lines-in-out-of-order-fashion) The tl;dr version of that thread is that it might be a compiler bug. — cf-, Mar 31 '14 at 04:48
Did you compile your code without optimisations, i.e. did you explicitly include `-O0` in the compiler flags? — Hristo Iliev, Mar 31 '14 at 08:17
@computerfreaker, it might be a stupid question, what does "tl;dr version of that thread" mean? — user3475359, Apr 01 '14 at 05:41
@user3475359 It just means "here's a quick summary of the thread". It's mostly for future reference in case the link breaks or something similar. I also don't actually know if that thread covers your issue, which is why I said it *might* be relevant for you. — cf-, Apr 01 '14 at 05:43
@HristoIliev, I have just checked the compiler option and it is -O2 but with -g flag. I guess you implied that optimization might have caused the weird behavior, right? I will try it with -O0 option and report back if it's corrected. I thought -g option is sufficient enough for debugging, but it might not true. — user3475359, Apr 01 '14 at 05:44
"tl;dr" means "too long; didn't read". Originated as a pejorative way to dismiss long texts by the Google/Twitter generation (due to the apparent lack of attention span), it is now mostly used as a replacement of "summary". And yes, `-O2 -g` is generally a bad idea. Proper debugging should be done with `-O0 -g`. — Hristo Iliev, Apr 01 '14 at 07:06
@HristoIliev, Yes, I confirmed that now everything works fine with -O0 -g. Thank you so much! Can you provide the answer to this question consisting of your comment? Then I will accept it. Thanks — user3475359, Apr 02 '14 at 04:13
Try `-Og -g`, which optimizes for debugging experience. It enables the optimizations that don't interfere with debugging, so you get reasonably fast, yet debuggable, binaries. — bwDraco, May 11 '15 at 23:17

score 3 · Accepted Answer · answered Apr 02 '14 at 08:05

The observed behaviour is usually a result of compiler optimisations being active. Optimisations might result in the binary code not following completely the structure of the source - it still gives the same result, but the compiler rearranges the operations in such a way as to be more efficiently executed. Also, some functions might get inlined. As a result the correspondence between range of instructions and source lines in the debug information becomes useless.

When doing source level debugging, always make sure that optimisation is turned off using the -O0 flag for most compilers or by not giving any optimisation flags for Sun/Oracle compilers. Note that some compiler options might result in elevated optimisation levels. For example, enabling OpenMP support in Sun/Oracle Studio with -xopenmp automatically raises the optimisation level to -xO3. Instead, -xopenmp=noopt should be used and no optimisation level specified explicitly. Also, some compilers optimise by default, for example Intel's compiler.

Why gdb goes back and forth when debugging MPI program?

1 Answers1