I am looking for concrete advice for dealing with a rather high-level problem: how to debug software (a Genetic Algorithm in case you are interested) that:
- Runs tasks across multiple threads (I don't control which thread runs which task)
- Each task's execution depends upon random values (I don't control the randomization seed)
- A task's state is a complex graph of objects which cannot be easily serialized to a flat human-readable format
So far, I've tried the following:
Examining individual threads in a debugger: This is problematic because most tasks complete successfully (setting breakpoints in advance of a problem leads to many false positives). On the flip side, if I set a breakpoint that stops once a task is in a bad state, I cannot step back in time to figure out how I ended up there.
Dumping trace logs is great in theory (I can step back in time once I spot a bad state) but I haven't figured out yet how to serialize a task's state to a flat human-readable format.
In an ideal world, I would like to be able to set a breakpoint for a bad state then step back in time using a debugger to examine how I got to this point.
Have you run into this kind of problem before? How did you debug it?