0

I am writing a CHIP-8 interpreter in C++ with SDL2. The source code is at https://github.com/robbie0630/Chip8Emu. There is a problem where it gets a segmentation fault with this ROM. I tried to debug the problem with GDB, but when I type bt, it displays an incomplete stack trace, only showing the top two functions, making me unable to effectively diagnose the problem. How do I get a full and useful stack trace?

EDIT: When I run bt, GDB displays this:

#0  0x0000000101411a14 in ?? ()
#1  0x0000000000406956 in Chip8_CPU::doCycle (this=0x7fffffffc7b0) at /my/home/code/Chip8Emu/src/cpu.cpp:223
#2  0x0000000000402080 in main (argc=2, argv=0x7fffffffe108) at /my/home/code/Chip8Emu/src/main.cpp:152

This is useless because ?? does not indicate anything, and line 223 of cpu.cpp is a function call.

EDIT 2: I ran valgrind on the program, and here is the output:

==11791== Conditional jump or move depends on uninitialised value(s)
==11791==    at 0x406BA0: Chip8_CPU::doCycle() (cpu.cpp:215)
==11791==    by 0x4020EF: main (main.cpp:152)
==11791== 
==11791== Jump to the invalid address stated on the next line
==11791==    at 0x101411A74: ???
==11791==    by 0x4020EF: main (main.cpp:152)
==11791==  Address 0x101411a74 is not stack'd, malloc'd or (recently) free'd
==11791== 
==11791== 
==11791== Process terminating with default action of signal 11 (SIGSEGV)
==11791==  Access not within mapped region at address 0x101411A74
==11791==    at 0x101411A74: ???
==11791==    by 0x4020EF: main (main.cpp:152)
==11791==  If you believe this happened as a result of a stack
==11791==  overflow in your program's main thread (unlikely but
==11791==  possible), you can try to increase the size of the
==11791==  main thread stack using the --main-stacksize= flag.
==11791==  The main thread stack size used in this run was 8388608.
==11791== 
==11791== HEAP SUMMARY:
==11791==     in use at exit: 7,827,602 bytes in 41,498 blocks
==11791==   total heap usage: 169,848 allocs, 128,350 frees, 94,139,303 bytes allocated
==11791== 
==11791== LEAK SUMMARY:
==11791==    definitely lost: 0 bytes in 0 blocks
==11791==    indirectly lost: 0 bytes in 0 blocks
==11791==      possibly lost: 4,056,685 bytes in 36,878 blocks
==11791==    still reachable: 3,770,917 bytes in 4,620 blocks
==11791==         suppressed: 0 bytes in 0 blocks
==11791== Rerun with --leak-check=full to see details of leaked memory
==11791== 
==11791== For counts of detected and suppressed errors, rerun with: -v
==11791== Use --track-origins=yes to see where uninitialised values come from
==11791== ERROR SUMMARY: 12 errors from 3 contexts (suppressed: 0 from 0)
Killed

EDIT 3: I ran GDB again, this time watching GfxDraw, and I noticed this happened:

Old value = (void (*)(array2d)) 0x1411bc4
New value = (void (*)(array2d)) 0x101411bc4
Chip8_CPU::doCycle (this=0x7fffffffc7a0) at /home/robbie/code/Chip8Emu/src/cpu.cpp:213
(gdb) cont
Continuing.

Thread 1 "Chip8Emu" received signal SIGSEGV, Segmentation fault.
0x0000000101411bc4 in ?? ()

So somehow GfxDraw is getting modified to an invalid function pointer. I can't figure out where it is modified, however.

robbie
  • 1,219
  • 1
  • 11
  • 25
  • Would you mind showing us a copy-paste (as text) of the GDB output? – Some programmer dude Feb 13 '17 at 00:37
  • Those `??` in the top frame indicates a stack problem. You probably have some buffer overflow somewhere leading to *undefined behavior* and the stack-smashing. Use the debugger to step through the code, and employ a memory debugger like [Valgrind](http://valgrind.org/) to help you find the source of the problem. – Some programmer dude Feb 13 '17 at 00:43
  • @Someprogrammerdude I do not know where to step, though. – robbie Feb 13 '17 at 00:49
  • Well on line 223 in cpu.cpp you have a call to `DrawGfx`, this isn't shown in the call stack, so the problem could be there. Putting a breakpoint on that call, and step through the `DrawGfx` function might help. But really try using Valgrind, it will tell you exactly when and where you do anything bad to memory. – Some programmer dude Feb 13 '17 at 00:57
  • @Someprogrammerdude I ran valgrind as per your request, and I edited the output in. – robbie Feb 13 '17 at 01:11
  • I'll see if I can help you later in the morning. But in the time perhaps you could include a link to the program you're supposed to run (the CHIP-8 program I mean)? Is it long? – Some programmer dude Feb 13 '17 at 01:41
  • @Someprogrammerdude I did link it, and I am not sure of its length. – robbie Feb 13 '17 at 01:43
  • Ah sorry, missed that. :) Anyway, it doesn't look *to* big, should be possible to step through *in full* (from the entry of `main`) in a debugger (it's hard and takes time but sometimes necessary). – Some programmer dude Feb 13 '17 at 01:46
  • Oh, and in the future I suggest you get a GUI shell for GDB. Then you can easily step through code while at the same time be able to see memory and variables and their values, without having to type commands to see only some of them from time to time. Saves you a lot of time. :) – Some programmer dude Feb 13 '17 at 01:54
  • @Someprogrammerdude GfxDraw is a function pointer, so maybe it gets corrupted somewhere. I'll look into it later. – robbie Feb 13 '17 at 03:33
  • I haven't had much time to work on this, but at least it is reproducible. And it's consistent when and where the problems are, so it should be easy to debug. I'll post an answer with all the details when I'm done, so you can learn how to debug things like this yourself in the future. – Some programmer dude Feb 14 '17 at 18:19

1 Answers1

0

After a few months, I finally identified the problem. Some nasty CHIP-8 programs make illegal memory accesses to the graphics memory that are outside of the bounds of the array and corrupt properties of the CPU (such as GfxDraw). I fixed this problem by accessing the graphics memory with at and ignoring std::out_of_range errors. It seems to work for now, so I'm declaring it the solution.

robbie
  • 1,219
  • 1
  • 11
  • 25