1

My code is calling a function from a third-party library before the exit of the program. Unfortunately the called function blocks the main thread, which is caused by pthread_join() in the .so library.

Since it is inside the library, which is out of my control, I am wandering how to break it so the main thread can proceed.

Attaching the info from using gdb:

0x00007ffff63cd06d in pthread_join (threadid=140737189869312, thread_return=0x0)
    at pthread_join.c:89
89          lll_wait_tid (pd->tid);
Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.4-5.el6.x86_64 krb5-libs-1.10.3-65.el6.x86_64 libcom_err-1.41.12-23.el6.x86_64 libselinux-2.0.94-7.el6.x86_64 openssl-1.0.1e-57.el6.x86_64

Thanks in advance.

lichgo
  • 523
  • 1
  • 6
  • 16
  • Presumably it is calling `pthread_join()` for a good reason. Even if you are somehow successful in defeating that call, the result will likely be a crash or other misbehavior, since the thread it is waiting for would presumably still be active and trying to use data structures while they are getting torn down during process exit. A better approach might be to investigate why the thread that the `pthread_join()` call is waiting on isn't exiting -- if you can fix that, then the `pthread_join()` call will return quickly and the problem will be properly solved. – Jeremy Friesner Jan 14 '19 at 04:00
  • While I agree with the comment above, you could preload a shared library using `LD_PRELOAD` that overrides the `pthread_join` method. – Geoffrey Jan 14 '19 at 07:04
  • That would almost certainly be disastrous. – David Schwartz Jan 14 '19 at 08:09
  • @JeremyFriesner Thanks, Jeremy. I totally understand what you suggested and obviously that is also what I've been striving for. Actually this issue stems from my change from GLIBC 2.12 to GLIBC 2.14. The function from the library works well and does not call `pthread_join` in GLIBC 2.12. After I moved to GLIBC 2.14 (for some other libraries), this issue is produced. Do you have any clue about possible reasions? – lichgo Jan 14 '19 at 08:14
  • How does the library know which thread is the main thread? Isn't it blocking some specific thread that happened to call a library function earlier on? – MSalters Jan 14 '19 at 08:50

2 Answers2

2

The library is designed to have the calling thread wait for something to finish. Since you can't change the design of the library, just call the library from a thread that has nothing else to do.

By the way you design the interaction, you can then get whatever semantics you want. If you want the calling thread to get the results at its convenience later, you can use a promise/future. You can design the calling thread to wait a certain amount of time and then timeout. In the timeout case, you can ignore the result if you don't need it or you can design some way to check and get the result later. You can also have the thread that calls the library do whatever needs to be done with the result so that the calling thread doesn't have to worry about it.

Just quarantine the code you can't control and write whatever code around it you need to get the behavior your code needs. The library needs the thread that calls it to wait until it's done, so isolate the thread that calls it and let the library have what it wants.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • Thanks, David. Actually in my main thread, I did detach a thread that calls the library. I found that when the detached thread is destroyed, the blocking thread in the library is still there, and thus blocks the main thread. I am very curious about why it happened. – lichgo Jan 14 '19 at 08:06
  • I don't understand what you mean by "blocks the main thread" exactly. – David Schwartz Jan 14 '19 at 08:08
  • Sorry to confuse you. By "blocks the main thread", I mean the program next exits even when it reaches the last line in `main()`. – lichgo Jan 14 '19 at 08:17
  • @lichgo Probably because the library is still doing something it considers important. Does it provide some kind of shut down function? – David Schwartz Jan 14 '19 at 08:18
  • Hi David. Actually the function I called is exactly its shut down function. It works well in GLIBC 2.12. After I moved to GLIBC 2.14, this issue is produced. – lichgo Jan 14 '19 at 08:21
1

If you call exit, the process is terminated without shutting down the other threads.

If you have a pthread_t handle for the thread that is being waited on, you can perhaps call pthread_cancel on it, but if the application and libraries are not prepared to handle thread cancellation, it will cause other problems. (Canceling the thread does pthread_join will not help because the shutdown will then block on the same thread that pthread_join waits on.)

In general, it is probably a better idea to figure out why the pthread_join call is waiting indefinitely in your environment (that is, why the other thread is not termining), and fix that.

Florian Weimer
  • 32,022
  • 3
  • 48
  • 92