0

I implemented a recursive algorithm which runs for about 1730 recursions and then crashes with a mysterious SIGSEGV. I tried my gdb and got the following output:

Program received signal SIGSEGV, Segmentation fault.
0x000000000040b7da in Town::get_cur_capacity (this=0x61efe0) at ./solver/Darstellung.cpp:54
54      return left_over_capacity;
(gdb) print 0x61efe0
$1 = 6418400
(gdb) print *0x61efe0
Cannot access memory at address 0x61efe0
(gdb) print this
$2 = (const Town * const) 0x61efe0
(gdb) 

How can it be that the debugger does know that it should be a const Town pointer but is not able to access the memory to give me a dump? I am quite sure that there is no error within this method as it is used several thousand times before the crash, like every other function within the Program. Is there any chance that this is a OS related problem? I am using Linux Ubuntu 64-Bit.

My simplified algorithm:

bool solveproblem(ptr_to_model) {
        if( way_one(ptr_to_model) )
              return true;

        if(way_two(ptr_to_model) )
              return true;

        if( check_if_solved)
              return true;

        return false;
}

bool way_one(ptr_to_model) {
     for(go_through_current_problem_configuration) {
     if(check_stuff) {
          ptr_to_model->execute_partial_solution(...); //adds another problem configuration to the stack within the model
          if(solveproblem(ptr_to_model))
                  return true;

          ptr_to_model->redo_last_step();
     }
     }
     return false;
}

bool way_two(...) {
   /*basicly the same as way one*/
}

bool check_if_solve(...) {
       if(problem_solved)
              return true;
       else
              return false;
}

The model is similiar to the name, it represents all steps the algorithm made through the time by pushing a new "layer" on its stack which is a modified (hopfully simplified) problem made out of the old one, considering the partial solution evaluated by the algorithm. Hope i narrowed it down enough and understandable.

Mark B
  • 95,107
  • 10
  • 109
  • 188
Sim
  • 4,199
  • 4
  • 39
  • 77
  • That's a pretty big recursion depth! Can you give us simplified code of your algorithm? In some cases, you can "unwrap" recursion into an endless loop. – Blender Oct 03 '11 at 20:08
  • 4
    This is almost certainly a problem with your code rather than an "OS related problem". You could be running out of some resource (e.g. stack space), or hitting some corner case bug in your code. It's impossible to tell without seeing the code. – NPE Oct 03 '11 at 20:12
  • 2
    I've found that debugging deeply recursive algorithms works best by dropping assert()'s into your code to verify invariants. That way you force it to dump core at the moment when a condition you don't expect occurs rather than much later when you get the SIGSEGV. Also asserts are preferable in this case because since you have 1700 recursive calls you'll be hard pressed to find a good breakpoint that doesn't get hit thousands of times. – Kevin Oct 03 '11 at 20:16
  • possible duplicate of [Getting C++ Segmentation Fault error in recursive n-queen program](http://stackoverflow.com/questions/7639378/getting-c-segmentation-fault-error-in-recursive-n-queen-program) – littleadv Oct 03 '11 at 20:18

2 Answers2

5

If you're 1700 levels deep in recursion it's not unbelievable that you overran your stack and corrupted a call parameter which could easily lead to this sort of crash.

If you use g++ try adding -fstack-protector-all to see if it helps you get better diagnostics.

EDIT: Another indicator is if your backtrace inside gdb becomes circular or doesn't lead anywhere: This is a strong indicator the stack has become corrupted.

And in response to the comment, there isn't a sure-fire way to determine if something is a stack overflow or a "more normal" heap corruption. Obviously valgrind is always a solid option for memory errors if it's available. You can use ulimit in your shell or (I believe) setrlimit programmatically to configure the stack limit. Note that there are hard upper bound limits and that it's often better to change your recursion to be less stack-abusive rather than increasing the stack size.

Mark B
  • 95,107
  • 10
  • 109
  • 188
0

How large are the parameters you're passing on the stack? At that depth, you could be overflowing if you're passing around 5k for a 8M stack. That's fairly large for stack variables, but possible. Alternately, you may be smashing your stack by writing past the end of a buffer stored on the stack (often a string buffer). The fact that you crash in the return suggests that's a possibility.

Rob Napier
  • 286,113
  • 34
  • 456
  • 610
  • i am using the standart stack implementation (within my model) and pushing a class containing short variables and vectors of short variables. I use about 300MB of Ram until it crashes. There are no other vars passed except the pointer, if you actually meant the function itself. – Sim Oct 03 '11 at 20:23