0

I have strace'd a correctly working program to compare with a faulty one. In the correct one, I have two threads (using pthreads) with one waiting for the other with regular pthread_join. Under the hoods I can see that the waiting (primary) thread waits on a futex (looking into sources and verifying with GDB the address matches &pd->tid). The part I don't understand is that there's no matching FUTEX_WAKE on this address when the second thread exits:

2579217 futex(0x7fe28bfff910, FUTEX_WAIT_BITSET|FUTEX_CLOCK_REALTIME, 2579218, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>
...
2579218 madvise(0x7fe28bf00000, 1024000, MADV_DONTNEED) = 0
2579218 exit(0)                         = ?
2579217 <... futex resumed>)            = 0
2579218 +++ exited with 0 +++
2579217 exit_group(0)

I have suspected the robust futex ABI to leave the wakeup to kernel, but the list seems to be empty. I think that it would make sense to wakeup (or segfault?) waiting threads if the memory with futex word gets unmapped but this does not seem to be the case. What makes kernel wake up this primary thread?

Radim Vansa
  • 5,686
  • 2
  • 25
  • 40
  • The kernel simply writes 0 to the TID and then makes thread 1 return from the `futex` syscall (which is blocking). There is no need for any `FUTEX_WAKE`, the kernel itself wakes up the waiter. – Marco Bonelli May 31 '23 at 12:20
  • 1
    `strace` will not show the `FUTEX_WAKE` done by the kernel on behalf of the exiting thread. The wake-up is done by the call to `do_futex()` (or `sys_futex()` for older kernels) from the `mm_release()` function in "kernel/fork.c". – Ian Abbott May 31 '23 at 13:28
  • Thanks for the pointer to `mm_release()`, that's what I was looking for (I need to compare why this behaviour does not work in the other program). – Radim Vansa May 31 '23 at 14:22

1 Answers1

0

Thanks to the hints in comments I was able to figure out what's happening, so only to explain things a bit further: the wakeup is documented in clone function through the CLONE_CHILD_CLEARTID flag and can be set or changed later during the life of the thread using the set_tid_address syscall.

One more thing that helped me debugging is trace from bcctools - I was able to debug kernel through

sudo trace-bpfcc -K 'mm_release(struct task_struct *tsk, struct mm_struct *mm) "%x %d", tsk->clear_child_tid, mm->mm_users.counter'
Radim Vansa
  • 5,686
  • 2
  • 25
  • 40