What constitutes "readable" (kqueue/epoll)

Question

I know that if the remote host gracefully shuts down a connection, epoll will report EPOLLIN, and calling read or recv will not block, and will return 0 bytes (i.e. end of stream).

However, if the connection is not closed gracefully, and a write or send operation fails, does this cause epoll to subsequently return EPOLLIN for that socket, producing the same/similar end of stream scenario?

I've tried to find documentation on this behaviour, but have not succeeded, and while I could test it, I'm not interested in what happens on a specific distribution with a specific kernel version.

score 2 · Answer 1 · answered Nov 17 '13 at 00:43

It is indeed not entirely obvious from the specification, but it works as follows for poll():

If there is data available to be read, even if the connection is closed, POLLIN is returned.
If neither reading or writing is possible because of a closed connection, POLLHUP or POLLERR is returned.
If reading is no longer possible but writing is (such as if the other side did shutdown(SHUT_WR)), POLLIN is returned and POLLHUP and POLLERR are not returned. (This allows waiting for POLLOUT normally.)

The simple thing to do is to try a read when any of POLLIN, POLLHUP and POLLERR are set.

In kqueue(), there is just an EVFILT_READ filter that may be triggered. This is described in the man page and should be clear enough.

Note that if you don't enable TCP keepalives (FreeBSD enables them by default but most other operating systems do not), waiting for data to read may get stuck forever if the network breaks in certain ways. Even if TCP keepalives are on, it tends to take a few hours to detect a broken connection.

Point 2 is inconsistent with point 3. TCP can't distinguish between an incoming close and an incoming SHUT_WR. Both are a FIN, and both must therefore produce EPOLLIN. — user207421, Nov 17 '13 at 02:45
@EJP TCP can distinguish between the other side doing `SHUT_WR` while still reading data and an error such as a timeout or a packet with RST set that prevents both reading and writing. However, `close` appears to the other side the same way as `SHUT_WR` until an attempt is made to send data, and `SHUT_RD` only affects the other side when it attempts to send data. — jilles, Nov 17 '13 at 22:16

score -1 · Answer 2 · answered Oct 29 '13 at 18:15

It may not return EPOLLIN when the peer machine is closed unexpectly. In the past, I encounted this kind of phenomenon by VirtualBox as following steps:

Launch server on one VM.
Launch client on the other VM, connect the server and keep the connection without doing anything.
Save client VM state (something like hibernate).

And I saw the connection was still established in Server VM by

netstat -anp --tcp

In other words, EPOLLIN was not triggered in server.

http://tldp.org/HOWTO/html_single/TCP-Keepalive-HOWTO/ says that it will keep about 7200 seconds by default.

Of course, you can change keep alive timeout value by setsockopt or kernel parameters.

But some books says the better solution is to detect it in application layer, e.g. design the protocol that make sure sending some dummy messages periodically to detect the connection state.

My question was more specific though. If the connection is closed as in your example, and then a write fails, will EPOLLIN be returned now that it is known that the pipe is broken. — Robert Allan Hennigan Leahy, Oct 29 '13 at 19:05

score -1 · Answer 3 · answered Nov 17 '13 at 00:10

-1

epoll() is basically poll() but it scales better when you increase number of fds. I am not sure what it does when you are using it as edge-triggered interface. But for level triggered - yes, it will always return EPOLLIN, provided you are listening to this event, if end of stream is detected.

Though you must know TCP is not perfect. If connection is terminated abnormally (physycal link is down) by the other side, your side may never detect this until you write to the socket. TCP_KEEPALIVE may help, but not much.

answered Nov 17 '13 at 00:10

GreenScape

7,191
2
34
64

He's not asking about end of stream. He's asking about an ungracefully closed connection. – user207421 Nov 17 '13 at 02:43
@EJP, tell me please what is the differene? There is no such thing as "connection closed gacefully". Higher level protocols determine such things. – GreenScape Nov 18 '13 at 14:43

score -1 · Answer 4 · answered Nov 17 '13 at 02:41

However, if the connection is not closed gracefully, and a write or send operation fails, does this cause epoll to subsequently return EPOLLIN for that socket, producing the same/similar end of stream scenario?

No. That would imply receipt of a FIN, which means normal termination of the connection, which didn't happen. I would expect you would get an EPOLLERR or maybe EPOLLHUP.

But I'm curious why you wouldn't have already closed the socket on getting the write error, and why you would still be polling it. That's not correct behaviour.

I want to make sure all the data available on the socket is read. If I kill the socket when an error occurs writing, is it not possible that while the connection has terminated, there is still data to read? — Robert Allan Hennigan Leahy, Nov 20 '13 at 19:08

What constitutes "readable" (kqueue/epoll)

4 Answers4