2

I find breakpad does not handle sigsegv sometimes. and i wrote a simple example to reproduce it:

#include <vector>
#include <breakpad/client/linux/handler/exception_handler.h>

int InitBreakpad()
{
    char core_file_folder[] = "/tmp/cores/";
    google_breakpad::MinidumpDescriptor descriptor(core_file_folder);
    auto exception_handler_ =
        new google_breakpad::ExceptionHandler(descriptor,
        nullptr,
        nullptr,
        nullptr,
        true,
        -1);
}
int main()
{
     InitBreakpad();

     // int* ptr = nullptr;
     // *ptr = 1;
     std::vector<int> sum;
     sum.push_back(1);
     auto it = sum.begin();
     sum.erase(it);
     sum.erase(it);

     return 0;
}

and gcc is 4.8.5 and my comiple cmd is

g++ test_breakpad.cpp -I./include -I./include/breakpad -L./lib -lbreakpad -lbreakpad_client -std=c++11 -lpthread

run a.out, get "Segmentation fault" but no minidump is generated.

if i uncomment nullptr write, breakpad works!

what should i do to correct it?

GDB debug output:

(gdb) b google_breakpad::ExceptionHandler::~ExceptionHandler()
Breakpoint 2 at 0x402ed0: file src/client/linux/handler/exception_handler.cc, line 264.
(gdb) c
The program is not being run.
(gdb) r
Starting program: /home/zen/tmp/a.out
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".

Breakpoint 1, google_breakpad::ExceptionHandler::ExceptionHandler (this=0x619040, descriptor=..., filter=0x0, callback=0x0, callback_context=0x0, install_handler=true, server_fd=-1) at src/client/linux/handler/exception_handler.cc:224
224     ExceptionHandler::ExceptionHandler(const MinidumpDescriptor& descriptor,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 libgcc-4.8.5-11.el7.x86_64 libstdc++-4.8.5-11.el7.x86_64
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6
(gdb) c
Continuing.

Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.

and i tried breakpad out of process dump, but still got nothing(nullptr write works).

Zendo June
  • 91
  • 1
  • 3

1 Answers1

2

After some debugging I think that the reason that the sum.erase(it) does not create a minidump in your example is due to stack corruption.

While debugging you can see that the variable g_handler_stack_ in src/client/linux/handler/exception_handler.cc is correctly initialized and the google_breakpad::ExceptionHandler instance is correctly added to the vector. However when google_breakpad::ExceptionHandler::SignalHandler is called the vector is reported empty despite no calls to google_breakpad::ExceptionHandler::~ExceptionHandler or any of the std::vector methods that would change the vector.

Some further data points that point to stack corruption is that the code works with clang++. Additionally, as soon as we change the std::vector<int> sum; to a std::vector<int>* sum, which will ensure that we don't corrupt the stack, the minidump is written to disk.

moggi
  • 1,466
  • 4
  • 18
  • 29
  • thank s a lot. but after i change sum to alloc from heap, breakpad keeps silence. `std::vector* sum_p = new std::vector; sum_p->push_back(1); auto it = sum_p->begin(); sum_p->erase(it); sum_p->erase(it);` – Zendo June Sep 04 '17 at 04:17
  • Build breakpad with symbols (and possibly O0, O2 is the default) and set a breakpoint in `google_breakpad::ExceptionHandler::HandleSignal`. During my debugging sessions this method was reliably called but the `g_handler_stack_` variable is corrupted with your double erase code when compiled with gdb. – moggi Sep 04 '17 at 14:35
  • it's strange. the breakpoint at `google_breakpad::ExceptionHandler::HandleSignal` is **NOT** triggered. GDB shows `Program received signal SIGSEGV, Segmentation fault. 0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6`. is this env different? my env is centos7 64, gcc 4.8.5, gdb 7.6.1. – Zendo June Sep 05 '17 at 04:17
  • It is correct that gdb first reports a SIGSEGV. The signal handler should be called next. Call continue in gdb and check what happens. – moggi Sep 05 '17 at 04:18
  • thanks for your patiance. The signal handler is not called if i call continue. The first time i enter c, gdb shows the same: `Program received signal SIGSEGV, Segmentation fault. 0x00007ffff712f19d in __memmove_ssse3_back () from /lib64/libc.so.6`. The second time i enter c, gdb shows `Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. ` – Zendo June Sep 05 '17 at 05:50
  • I would start by setting a breakpoint into the constructor and destructor of the google_breakpad::ExceptionHandler class. Make sure that the constructor is called before the crash and the destructor is never called. – moggi Sep 05 '17 at 05:53
  • `Breakpoint 1, google_breakpad::ExceptionHandler::ExceptionHandler (this=0x619040, descriptor=..., filter=0x0, callback=0x0, callback_context=0x0, install_handler=true, server_fd=-1) at src/client/linux/handler/exception_handler.cc:224` is triggered, but the destructor breakpoint is not triggered. – Zendo June Sep 05 '17 at 06:49
  • I can reproduce your problem as soon as I uncomment my callback function. With the callback function and switching to a std::vector generated on the heap everything works, without the callback function I get the same problems as with the stack version. – moggi Sep 06 '17 at 04:47
  • This and a few more tests show that there is quite a lot of corruption going on with your test (and as there is not much on the stack you corrupt nearly all your variables). clang++ with -O2 does not generate a minidump with or without the callback and with heap or stack allocated std::vector, clang++ -O0 does with all the options. gcc seems to be less affected by the optimizer level and produces the same results with O0 and O2. – moggi Sep 06 '17 at 04:52
  • All in all I think you need to accept that your code is causing undefined behavior with stack and possibly heap corruption that can also result in the breakpad variables being corrupted. I'm still not sure why `google_breakpad::ExceptionHandler::HandleSignal` is not called for you. That is basically just the registered callback function for `sigaction` and should be called by the system/low level C code. – moggi Sep 06 '17 at 04:58
  • ye, i agree with u. vector erase iterator cause a memcpy, erase a illegal iterator may overwrite some data by memcpy and may corrupt the callstack or some other heap. but why signal handler is not called? – Zendo June Sep 06 '17 at 07:23
  • There are two parts, the signal handler should be called and I can not reproduce in any way a setup in which it is not called. However, what the breakpad internal signal handler does, e.g. writing the minidump and calling the callback, is stored in a variable that is corrupted in the second `std::vector::erase` method call. I can not reproduce in any of my tests a scenario that does not call the signal handler, only cases that don't call the callback and write minidumps. – moggi Sep 06 '17 at 08:49
  • i mean `breakpad::ExceptionHandler::SignalHandler()` is not called. double erase iterator has corupted breakpad. Now my solution is enable system core files(ulimit -c), and add my sighandler to catch coredump besides breakpad. – Zendo June Sep 07 '17 at 03:52