3

The Luasocket select function is supposed to tell when a socket can be read without blocking. It apparently can also be used to tell when a server socket is ready to accept a new connection however the documentation gives the following warning:

Another important note: calling select with a server socket in the receive parameter before a call to accept does not guarantee accept will return immediately. Use the settimeout method or accept might block forever.

Under what circumstances can accept block even when select told it was safe to read? Is there a way to force this problem to occur, for testing purposes?

hugomg
  • 68,213
  • 24
  • 160
  • 246

2 Answers2

2

I don't know where they got that idea. Never seen it in over 20 years of network programming.

It can happen of course if you have multiple select() threads, but I would expect the document to say so if that was what was intended.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • I never do trust functions that block indefinitely and rely on the OS to unblock you. At least with a timeout you can do other things or override what the OS is going to do. Especially around blocking connect. – hookenz Apr 15 '13 at 02:29
  • @Matt I always use a read timeout too, or select, but you're still relying on the operating system to unblock you. You Blocking connect doesn't block indefinitely, it gives a connection timeout after about a minute, which you can reduce (but not increase). – user207421 Apr 15 '13 at 02:37
  • @EJP: What is the problem if you have multiple threads? Having two threads calling accept even though there might only be one inbound connection? I know that Lua is single threaded but I'm wondering if this has do do with the warning on the documentation... – hugomg Apr 15 '13 at 02:59
  • @missingno If you have multiple threads that select for readability on the same listening socket, they will both fire when a connection arrives, but only one of the threads can successfully accept it. – user207421 Apr 15 '13 at 10:23
  • @EJP, The Luasocket author is almost certainly basing the warning on Richard Stevens' Unix Network Programming. Longer explanation provided in alternative answer. – rici Apr 18 '13 at 03:22
  • @rici If that's what they meant they should certainly have said so. – user207421 Apr 18 '13 at 10:18
1

This is summarized from Section 16.6 (Nonblocking accept) of the third edition of the late W.Richard Stevens' "Unix Network Programming", page 461-463. UNP is probably still the best available textbook on writing networking code.

Although you might think that accept cannot block after select indicates that a listening socket is ready, Stevens describes a race condition in some network stack implementations which can cause accept to block indefinitely. (A footnote attributes the description to "A.Gierth"). The problem is described by means of an echo client which:

  1. Connects to the server;

  2. Sets the SO_LINGER socket option on the connected socket;

  3. Immediately closes the socket. Because the SO_LINGER option has been set, closing the socket causes an RST (reset) to be sent.

Now, let's suppose the server is running but on a heavily-loaded machine. The modified echo client is run. The TCP connection causes the select call to return with an indication that there is a connection available. (Remember that the connection was actually accepted by the kernel and put into the accept queue; accept does not need to be executed for this to happen.)

However, the server code is interrupted by a process switch before the accept call is executed, and in the meanwhile, the client manages to finish steps (2) and (3). Then the kernel receives the reset from the client, and now the connection is no longer valid. It might, therefore, remove it from the accept queue.

So by the time the server code gets around to accepting the connection, there is no connection to accept, and the accept call blocks until the next connection, if there is one.

The behaviour described above might not actually happen. POSIX wants the accept call to fail with ECONNABORTED even if there is another available connection in the accept queue (which you also have to remember to deal with). According to Stevens:

In Section 5.11, we noted that when the client aborts the connection before the server calls `accept`, Berkeley-derived implementations do not return the aborted connection to the server, while other implementations should return `ECONNABORTED` but often return `EPROTO` instead.

Stevens' source code is available here, on the publisher's site; the modified client is nonblock/tcpcli03.c, and the modification to the server simply consists of sleeping for five seconds before calling accept. So you can try it on whatever systems you have available.

I don't believe that either FreeBSD or Linux exhibit the Berkeley-derived behaviour any more, although I'm pretty sure I remember it happening on FreeBSD (that could have been over a decade ago, and I no longer have a FreeBSD box handy to test it on.) OpenBSD seems to have been patched in 1999 to fix the problem (see patch to 2.4); probably the other Berkeley-derivatives made similar changes later. I have no idea about MacOSX (although it's probably the same as FreeBSD) or Windows. It might well be that no modern system exhibits the behavious, although it was surely observable when Stevens wrote UNP.

In any event, Stevens' advice is pretty simple, and it never hurts to be careful. What he suggests is:

  1. Always set a listening socket to non-blocking when you use select on it;

  2. If accept fails with EWOULDBLOCK, ECONNABORTED, EPROTO, or EINTR, ignore the error and return to the select loop.

rici
  • 234,347
  • 28
  • 237
  • 341