6

My linux application is performing non-blocking TCP connect syscall and then use epoll_wait to detect three way handshake completion. Sometimes epoll_wait returns with both POLLOUT & POLLERR events set for the same socket descriptor.

I would like to understand what's going on at TCP level. I'm not able to reproduce it on demand. My guess is that between two calls to epoll_wait inside my event loop we had a SYN+ACK/ACK/FIN sequence but again I'm not able to reproduce it.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
doccarcass
  • 71
  • 1
  • 1
  • 2

2 Answers2

6

It is likely for this to happen if the connect has failed - for example with "connection timed out" (for sockets doing a non-blocking connect, POLLOUT becomes set when the connect operation has finished for both successful and unsuccessful outcomes).

When POLLOUT becomes set for the socket, use getsockopt(sock, SOL_SOCKET, SO_ERROR, ...) to check if the connect succeeded or not (the SO_ERROR socket option is 0 in this case, and otherwise indicates why the connect failed).

caf
  • 233,326
  • 40
  • 323
  • 462
4

Here is some good information on non-blocking tcp connect().

When a socket error is detected (i.e. connection closed/refused/timedout), epoll will return the registered interest events POLLIN/POLLOUT with POLLERR. So epoll_wait() will return POLLOUT|POLLERR if you registered POLLOUT, or POLLIN|POLLOUT|POLLERR if POLLIN|POLLOUT was registered.

Just because epoll returns POLLIN doesn't mean there will be data available to read, since recv() may just return the error from the non-blocking connect() call. I think epoll returns all the registered events with POLLERR to make sure the program calls send()/recv()/etc.. and gets the socket error. Some programs never check for POLLERR/POLLHUP and only catch socket errors on the next send()/recv() call.

Neopallium
  • 1,419
  • 9
  • 18