I am trying to use waitpid()
for waiting for individual threads instead of processes. I know that pthread_join()
or std::thread::join()
are the typical ways for waiting for a thread. In my case, however, I am developing a monitoring application that forks and executes (via execv
) a program which in turn, spawns some threads. So, I cannot join the threads from the monitoring application, since they belong to a different process and I do not have access to the source code. Still, I want to be able to wait for these individual threads to finish.
For an easier visualization of what I am trying to achieve, I include a drawing, hoping to make it much more clear:
Everyhing works fine when I use processes, but waitpid
does not wait for threads. Basically, waitpid
returns -1
right after it is called (the thread is still running at that time for some more seconds).
Documentation for waitpid
states:
In the Linux kernel, a kernel-scheduled thread is not a distinct construct from a process. Instead, a thread is simply a process that is created using the Linux-unique clone(2) system call; other routines such as the portable pthread_create(3) call are implemented using clone(2). Before Linux 2.4, a thread was just a special case of a process, and as a consequence one thread could not wait on the children of another thread, even when the latter belongs to the same thread group. However, POSIX prescribes such functionality, and since Linux 2.4 a thread can, and by default will, wait on children of other threads in the same thread group.
That description only considers waiting from a thread to children of other threads (in my case I want to wait for threads children of another process). But, at least, it shows that waitpid
is thread-aware.
This is what I am using for waiting for the threads:
std::vector<pid_t> pids;
/* fill vector with thread IDs (LWP IDs) */
for (pid_t pid : pids) {
int status;
pid_t res = waitpid(pid, &status, __WALL);
std::cout << "waitpid rc: " << res << std::endl;
}
This code works for waiting for processes, but it fails for waiting for threads (even if __WALL
flag is used).
I am wondering whether it is actually possible to wait for a thread by using waitpid
. Is there any other flag that I need to use? Could you point me to any document where it is explained how to wait for threads of another process?
For reference, the code that I am using for creating the threads is:
static void foo(int seconds) {
int tid;
{
std::lock_guard<std::mutex> lock(mutex);
tid = syscall(__NR_gettid);
std::cout << "Thread " << tid << " is running\n";
pids.push_back(tid);
pids_ready.notify_all();
}
for (int i = 0; i < seconds; i++)
std::this_thread::sleep_for(std::chrono::seconds(1));
}
static void create_thread(int seconds) {
std::thread t(foo, seconds);
threads.push_back(std::move(t));
}
std::vector<pid_t> create_threads(int num, int seconds) {
for (int i = 0; i < num; i++)
create_thread(seconds);
std::unique_lock<std::mutex> lock(mutex);
pids_ready.wait(lock, [num]() { return pids.size() == num; });
return pids;
}
I am using GCC 4.6 and Ubuntu 12.04.
UPDATE: I managed to make it work by using ptrace
:
ptrace(PTRACE_ATTACH, tid, NULL, NULL);
waitpid(tid, &status, __WALL);
ptrace(PTRACE_CONT, tid, NULL, NULL);
while (true) {
waitpid(tid, &status, __WALL);
if (WIFEXITED(status)) // assume it will exit at some point
break;
ptrace(PTRACE_CONT, tid, NULL, NULL);
}
This code works both when T1, T2, ..., Tn are processes and when they are threads.
I have an issue, however. If I try this monitoring tool with multithreaded C++ applications, everything works fine. But the original intent was to use this monitoring tool with a Java application spawning several threads. When using a multithreaded Java application the waitpid
in the loop wakes up many times per second (the child thread is stopped by a SIGSEGV signal). This seems to be related to the fact that Java is using SIGSEGV for its own purposes (see this question, and this post).
All those wake-ups end up slowing down the application a lot. So am I wondering whether there is some flaw in my solution and whether there is a way to make it work with Java applications.