ptrace'ing multithread application

Question

I have a "debugger"-like application, named hyper-ptrace. It starts user_appl3 which is multithreaded with NPTL.

Main loop of hyper-ptrace is:

wait3(&status, FLAGS, &u);
// find a pid of child, which has a signal
switch (signal = WSTOPSIG(status))
{
  case SIGTRAP:
    do_some_analysis_of_the_child(pid, &status) // up to several ms
    break;
}
ptrace(PTRACE_CONT, pid); // discard signal, user_appl3 doesn't know anything 
                          //about this SIGTRAP

The SIGTRAP is generated for user_appl3 by hardware at some periodic interval for each thread and it is delivered to some of thread. Interval can be 100..1 ms or even less. It is a sort of per-CPU clocks with interrupts. Each threads runs on only its CPU (binded with affinity).

So there is the question1:

If thread1 got TRAP and debugger enters to do_some_analysis_of_the_child, (so debugger does not do a wait3 for second thread), and a bit time later thread2 gots TRAP too, what will be done by Linux kernel?

In my opinion: thread1 will be stopped because its get a signal and there is a waiting debugger. But thread2 continues to run (is it?). When thread2 gets a signal, there will be no a waiting debugger, so TRAP can be delivered to the thread2 itself, effectively killing it. Am I right?

And there is the second question, question2:

For this case, how should I rewrite the main loop of hyper-ptrace to lower the chances of delivering signal through to the user's thread, over the debugger? Nor trap-generating hardware neither user application can't be changed. Stopping the second thread is not a variant too.

I need analysis of both threads. Some it parts can be done only when thread is stopped.

Thanks in advance!

sigtrap here can be from breakpoints to, without extreme hardware TRAP-generator. — osgx, Aug 19 '10 at 16:06

caf · Accepted Answer · 2010-12-01T09:06:23.183

6

~~No, the signal isn't delivered to the application. The child application will stop when the signal happens, and your ptracing process will be notified about it next time it calls wait().~~

You're right - the tracing stop only applies to the main thread.

To get the behaviour you want, suspend the entire child process (every thread) immediately after the traced thread stops by sending a SIGSTOP to the process PID, and resume it with a SIGCONT when you're done:

wait3(&status, FLAGS, &u);

if (WIFSTOPPED(status))
    kill(pid, SIGSTOP);  /* Signal entire child process to stop */

switch (signal = WSTOPSIG(status))
{
  case SIGTRAP:
    do_some_analysis_of_the_child(pid, &status) // up to several ms
    break;
}

ptrace(PTRACE_CONT, pid, 0, 0); // discard signal, user_appl3 doesn't know anything about this SIGTRAP
kill(pid, SIGCONT);  /* Signal entire child process to resume */

edited Dec 01 '10 at 09:06

answered Aug 20 '10 at 01:41

caf

233,326
40
323
462

In my case the user application consists of 2 threads. Can you provide link to sources or documentation for case with threads? – osgx Aug 20 '10 at 08:20
@osgx: The *entire process*, meaning all threads, will stop on receipt of the signal. Whenever the POSIX documentation says that something happens to a process (eg. "the process is stopped"), it is talking about the whole process in the multithreaded case. If the documentation means a single thread only is affected, it always says so. – caf Nov 09 '10 at 04:19
@caf: I don't think that your last comment is correct. All of my experiments on linux have indicated that if one thread receives an event on linux, the other continue just fine and it is up to the debugger to stop all of the other threads manually. It may be different for other OSes like the *BSD family, but on linux each thread is a process, which must be ptraced individually. – Evan Teran Dec 01 '10 at 02:46
@Evan Teran: Having done some more tests, you are completely correct. I was mislead by the fact that sending a `SIGSTOP` to a multithreaded process indeed stops every thread (as POSIX demands), but a tracing stop is treated differently (and this is allowed, because `ptrace()` isn't part of POSIX at all). – caf Dec 01 '10 at 08:12
1

@caf: yea, that new snippet will work, but only if you are only interested in events from the main thread. If you want to capture events from other threads (be careful of breakpoints! any thread can trigger them) then you need to attach to each thread manually. It get really hairy and to be honest, I'm still figuring out how to handle it right. A good reference to the nitty-gritty can be found here: http://code.google.com/p/go/source/browse/src/pkg/debug/proc/ptrace-nptl.txt?r=7afea75128a67b404c087b20a7b704a5480a2178 thanks to the go developers. – Evan Teran Dec 01 '10 at 15:31
One key note is this part (second paragraph): "Note that SIGSTOP differs from its usual behavior when a process is being traced. Usually, a SIGSTOP sent to any thread in a thread group will stop all threads in the thread group. When a thread is traced, however, a SIGSTOP affects only the receiving thread (and any other threads in the thread group that are not traced)." – Evan Teran Dec 01 '10 at 15:32

ptrace'ing multithread application

1 Answers1