27

What is the behavior of the select(2) function when a file descriptor it is watching for reading is closed by another thread?

From some cursory testing, it does return right away. I suspect the outcome is either that (a) it still continues to wait for data, but if you actually tried to read from it you'd get EBADF (possibly -- there's a potential race) or (b) that it pretends as though the file descriptor were never passed in. If the latter case is true, passing in a single fd with no timeout would cause a deadlock if it were closed.

Kara
  • 6,115
  • 16
  • 50
  • 57
Joe Shaw
  • 22,066
  • 16
  • 70
  • 92
  • Possible duplicate of [breaking out from socket select](http://stackoverflow.com/questions/2486727/breaking-out-from-socket-select) – iammilind May 04 '16 at 13:22
  • I think it's different, although slightly related. The other question is asking explicitly how to break out of a `select()` from another thread (and `pipe()` is a good answer), whereas mine was more about behavior of `close()` on a `select()`ed socket. In the answers below, you'll see that the answer is, "it depends." – Joe Shaw May 10 '16 at 15:15
  • One of the more mysterious bug hunts I went on turned out to be due to just this problem: thread A was selecting on socket #x, which thread B closed. Shortly thereafter, thread C created a new socket, which also just happened to be socket #x (because the networking stack chose to reuse the number x for the new socket). At this point thread A (which was still trying to use socket #x) of course started selecting/reading/writing data on thread C's socket, even though they had absolutely no logical connection to each other. This was a total pain to track down. – Jeremy Friesner Jan 20 '17 at 20:53

4 Answers4

24

From some additional investigation, it appears that both dwc and bothie are right.

bothie's answer to the question boils down to: it's undefined behavior. That doesn't mean that it's unpredictable necessarily, but that different OSes do it differently. It would appear that systems like Solaris and HP-UX return from select(2) in this case, but Linux does not based on this post to the linux-kernel mailing list from 2001.

The argument on the linux-kernel mailing list is essentially that it is undefined (and broken) behavior to rely upon. In Linux's case, calling close(2) on the file descriptor effectively decrements a reference count on it. Since there is a select(2) call also with a reference to it, the fd will remain open and waiting for input until the select(2) returns. This is basically dwc's answer. You will get an event on the file descriptor and then it'll be closed. Trying to read from it will result in a EBADF, assuming the fd hasn't been recycled. (A concern that MarkR made in his answer, although I think it's probably avoidable in most cases with proper synchronization.)

So thank you all for the help.

Community
  • 1
  • 1
Joe Shaw
  • 22,066
  • 16
  • 70
  • 92
  • It is necessarily unpredictable. There is no predictable way to `close` a file descriptor in one thread while another thread is watching it for reading in `select` because there is no way the thread that calls `close` can know whether or not the other thread is already blocked in `select` or about to block in `select`. – David Schwartz Jan 20 '17 at 19:44
7

I would expect that it would behave as if the end-of-file had been reached, that's to say, it would return with the file descriptor shown as ready but any attempt to read it subsequently would return "bad file descriptor".

Having said that, doing that is very bad practice anyway, as you'd always have potential race conditions as another file descriptor with the same number could be opened by yet another thread immediately after the other 2nd closed it, then the selecting thread would end up waiting on the wrong one.

As soon as you close a file, its number becomes available for reuse, and may get reused by the next call to open(), socket() etc, even if by another thread. Therefore you really, really need to avoid this kind of thing.

MarkR
  • 62,604
  • 14
  • 116
  • 151
  • I thought that it might return as ready too, but that's not quite right: the descriptor isn't actually in a ready state -- it's closed. And as you mention, by the time you go to use it it could be reassigned to something else. – Joe Shaw Feb 12 '09 at 22:07
  • You could avoid the race by using a mutex for the data structure containing the fd, though. But that would only work if the select() call had a timeout defined. – Joe Shaw Feb 13 '09 at 16:41
6

The select system call is a way to wait for file desctriptors to change state while the programs doesn't have anything else to do. The main use is for server applications, which open a bunch of file descriptors and then wait for anything to do on them (accept new connections, read requests or send the responses). Those file descriptors will be opened in non-blocking io mode such that the server process won't hang in a syscall at any times.

This additionally means, there is no need for separate threads, because all the work, that could be done in the thread can be done prior to the select call as well. And if the work takes long, than it can be interrupted, select being called with timeout={0,0}, the file descriptors get handled and afterwards the work is being resumed.

Now, you close a file descriptor in another thread. Why do you have that extra thread at all, and why shall it close the file descriptor?

The POSIX standard doesn't provide any hints, what happens in this case, so what you're doing is UNDEFINED BEHAVIOR. Expect that the result will be very different between different operating systems and even between version of the same OS.

Regards, Bodo

Bodo Thiesen
  • 2,476
  • 18
  • 32
  • 1
    I think it'll have undefined behaviour anyway, because it is impossible to remove the race condition of the file descriptor being closed *just before* the select and another one being opened with the same number. – MarkR Feb 13 '09 at 16:32
3

It's a little confusing what you're asking...

Select() should return upon an "interesting" change. If the close() merely decremented the reference count and the file was still open for writing somewhere then there's no reason for select() to wake up.

If the other thread did close() on the only open descriptor then it gets more interesting, but I'd need to see a simple version of the code to see if something's really wrong.

dwc
  • 24,196
  • 7
  • 44
  • 55