10

Suppose an application is blocked at a cancellation point, for example read, and a signal is received and a signal handler invoked. Glibc/NPTL implements cancellation points by enabling asynchronous cancellation for the duration of the syscall, so as far as I can tell, asynchronous cancellation will remain in effect for the entire duration of the signal handler. This would of course be horribly wrong, as there are plenty of functions that are not async-cancel-safe but which are required to be safe to call from signal handlers.

This leaves me with two questions:

  • Am I wrong or is the glibc/NPTL behavior really this dangerously broken? If so, is such dangerous behavior conformant?
  • What, according to POSIX, is supposed to happen if a signal handler is invoked while the process is executing a function which is a cancellation point?

Edit: I've almost convinced myself that any thread which is a potential target of pthread_cancel must ensure that functions which are cancellation points can never be called from a signal handler in that thread's context:

On the one hand, any signal handler that can be invoked in a thread that might be cancelled and which uses any async-cancel-unsafe functions must disable cancellation before calling any function which is a cancellation point. This is because, from the perspective of the code interrupted by the signal, any such cancellation would be equivalent to asynchronous cancellation. On the other hand, a signal handler cannot disable cancellation, unless the code that will be running when the signal handler is invoked only uses async-signal-safe functions, because pthread_setcancelstate is not async-signal-safe.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • 2
    This series of questions has certainly succeeded in ensuring that my response to encountering pthreads cancellation will be to run fast in the other direction. – caf Mar 23 '11 at 23:59
  • @caf: If you make a habit of keeping signals blocked in threads that might be cancelled (or even better, in all threads but the main thread), and enclose any resource-allocating syscalls that may be cancellation points in calls to fully disable and restore the cancellation state, then cancellation is not dangerous even on fairly bad implementations, and can be a powerful tool. My series of questions has been from an implementor's standpoint, with an aim to fully conform to the spec and possibly go beyond, guaranteeing no resource leaks or state corruption in a conforming application. – R.. GitHub STOP HELPING ICE Mar 24 '11 at 00:18

3 Answers3

4

To answer the first half of my own question: glibc does exhibit the behavior I predicted. Signal handlers that run while blocked at a cancellation point run under asynchronous cancellation. To see this effect, simply create a thread that invokes a cancellation point that will block forever (or for a long time), wait a moment, send it a signal, wait a moment again, and cancel and join it. The signal handler should fiddle with some volatile variables in a way that makes it clear that it ran for an unpredictable amount of time before being terminated asynchronously.

As for whether POSIX allows this behavior, I'm still not 100% certain. POSIX states:

Whenever a thread has cancelability enabled and a cancellation request has been made with that thread as the target, and the thread then calls any function that is a cancellation point (such as pthread_testcancel() or read()), the cancellation request shall be acted upon before the function returns. If a thread has cancelability enabled and a cancellation request is made with the thread as a target while the thread is suspended at a cancellation point, the thread shall be awakened and the cancellation request shall be acted upon. It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution if:

  • The thread is suspended at a cancellation point and the event for which it is waiting occurs

  • A specified timeout expired

before the cancellation request is acted upon.

Presumably executing a signal handler is not a case of being "suspended", so I'm leaning towards interpreting glibc's behavior here as non-conformant.

Community
  • 1
  • 1
R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
1

Rich,

I came across this question while doing the AC-safe documentation review that Alex Oliva was working on for glibc.

It is my opinion that the GNU C Library implementation (nptl-based) is not broken. While it is true that asynchronous cancellation is enabled around blocking syscalls (which are required to be cancellation points) such behaviour should still be conformant.

It is also true that a signal taken after asynchronous cancellation is enabled will result in a signal handler running with asynchronous cancellation enabled. It is also true that doing anything in that handler that is not also asynchronous cancellation safe is dangerous.

It is also true that if another thread calls pthread_cancel with the signal running thread as the target, that such cancellation will be acted upon immediately. This is still in line with the POSIX wording of "before the function returns" (in this case read had not returned and the target thread is in the signal handler).

The problem with the signal is that it causes the thread to be in two simultaneous states, both perpetually in a cancellation point, and executing instructions. If the cancellation request arrives it is my opinion that it is conformant for it to be acted upon immediately. Though the Austin Group might clarify.

The problem with the glibc implementation is that it requires all signal handlers, executed by the to-be cancelled thread. to only call asynchronous-cancel-safe functions. This is a non-obvious requirement that doesn't stem from the standard, but doesn't render it non-conformant.

On potential solution to solve the fragility of signal handlers:

  • Do not enable async-cancellation for blocking syscalls, instead enable a new IN_SYSCALL bit in the cancellation implementation.

  • When pthread_cancel is called and the target thread has IN_SYSCALL set then send SIGCANCEL to the thread as normally would be done for async-cancel, but the SIGCANCEL handler does nothing (other than the side effect of interrupting the syscall).

  • The wrapper around the syscalls will look for cancellation to have been sent and cancel the thread before the wrapper returns.

While posting this on stack overflow was fun, I don't know anyone else that reads this and can answer your question in the detail required.

I think any further discussion should happen on the Austin Group mailing list as part of a POSIX standards discussion, or should happen on libc-alpha as phart of a glibc implementation discussion.

Carlos O'Donell
  • 594
  • 3
  • 10
  • Your answer does a good job of covering the current implementation in glibc, but in my opinion it's behind with respect to the current state of the conformance discussion. It's now known that the current glibc implementation with temporary async cancellation is non-conforming (due to violating the requirements on [lack of] side effects when cancellation takes place) and moreover that the current implementation produces behavior (risk of double-close and other dangerous race conditions) which the Austin Group seems dedicated to avoiding. – R.. GitHub STOP HELPING ICE Jan 10 '14 at 04:15
  • The `close` issue was covered in Austin Group tracker issue 614, which I reported, where it was deemed already resolved by issue 529, which dealt with `close` and `EINTR`. Issue 529 was resolved by tightening the requirements on the side effects of `close` when `EINTR` is returned. Since they seemed this to have implications for the requirements of `close` under cancellation, I interpret that as a re-affirmation that the requirements that side effects on cancellation match side effects on `EINTR` governs the behavior in the situation we're talking about. – R.. GitHub STOP HELPING ICE Jan 10 '14 at 04:19
  • @R.. I'm glad to see that there is movement with regards to the standards questions. I'm following up in https://sourceware.org/bugzilla/show_bug.cgi?id=12683 with a proposed solution for glibc. – Carlos O'Donell Jan 10 '14 at 19:48
  • BTW, `s/they seemed/they deemed/` in my last comment. Without that, it's confusing... – R.. GitHub STOP HELPING ICE Jan 10 '14 at 19:52
0

I think what you are looking for is a combination of two things:

Some system calls may be interrupted by signals, which causes the EINTR error to be returned. This is normal behavior, but I have never been clear on what happens if, for example, you are in the middle of a read -- is nothing read from the stream? Perhaps somebody can comment on this to help clarify.

System calls which should not be interrupted, like those you are worried about, should be wrapped in calls to sigprocmask (or pthread_sigmask in a thread) to keep them from being interrupted. Once you re-enable the signals, any signals received while blocked will be delivered. Like with interrupts, though, if you block for too long, you may miss some due to overwriting (receiving the same signal multiple times counting as one pending signal).

Jonathan
  • 13,354
  • 4
  • 36
  • 32
  • 1
    `read()` can return a partial read or `EINTR` when interrupted by a signal, or it may be restarted. Anyway, R.. isn't asking about this. he's asking about the interactions between pthreads cancellation and signal handling. – ninjalj Mar 23 '11 at 19:33
  • 3
    If `read` is interrupted by a signal and some data has been read, it returns the number of bytes read immediately after the signal handler runs. If no data had been read, it either returns -1 and sets `errno` to `EINTR` (if the handler was installed without `SA_RESTART`) or continues waiting for data after the signal handler returns. – R.. GitHub STOP HELPING ICE Mar 23 '11 at 19:34
  • And yes, as ninjalj said, that's largely unrelated to my question... although there is another interesting issue - I would consider it a bug - in glibc's implementation, whereby signal interruption/restart issues intersect with cancellation issues and can result in leaking resources at cancellation. – R.. GitHub STOP HELPING ICE Mar 23 '11 at 19:43
  • I'm sorry, I must have misunderstood your question. Have you tried seeing what would happen if you sent signals to a program while making these kinds of calls? – Jonathan Mar 23 '11 at 20:04
  • I could run some tests, but in the real world the issue involves a narrow time window that's hard to hit. Perhaps I could just write a signal handler that spins a few billion times and see if it's asynchronously cancellable if it runs while the process is blocked at a cancellation point. – R.. GitHub STOP HELPING ICE Mar 23 '11 at 20:52
  • 1
    I wrote a test program and confirmed that glibc/NPTL exhibits the "unpleasant" behavior I predicted. – R.. GitHub STOP HELPING ICE Mar 23 '11 at 23:19