How is connect() failure notified in epoll?

Question

I am writing a simple test script(python) to test a web server's performance(all this server does is HTTP redirect). Socket is set to non-blocking, and registered to an epoll instance.

How can I know the connect() is failed because the server can't accept more connections? I am currently using EPOLLERR as the indicator. Is this correct?

Edit: Assumptions: 1) IP layer network unreachability will not be considered.

score 1 · Answer 1 · answered Nov 15 '12 at 01:35

1

That catches the case of Connection Refused and other socket errors. Since I assume you are registering for read/write availability (success) upon the pending socket as well, you should also manually time-out those connections which have failed to notify you of read, write, or error availability on the associated file descriptor within an acceptable time limit.

ECONNREFUSED is generally only returned when the server's accept() queue exceeds its limit or when the server isn't even bound to a socket at the remote port. ENETDOWN, EHOSTDOWN, ENETUNREACH, and EHOSTUNREACH are only returned when a lower layer than TCP (e.g., IP) knows it cannot reach the host for some reason, and so they are not particularly helpful for stress testing a web server's performance.

Thus, you need to also bound the time taken to establish a connection with a timeout to cover the full gamut of stress test failure scenarios. Choice of timeout value is up to you.

answered Nov 15 '12 at 01:35

Matthew Hall

605
3
7

Thanks for the answer! manual timeout handling might need some work though with non-blocking sockets. Since it's only a redirect server, I am thinking just to measure how many redirect responses has been received per second. – wei Nov 15 '12 at 02:55
one more question, does that mean EPOLLERR is not useful? since I need the errno to decide which error it is, which means I have to connect() again to get that? – wei Nov 15 '12 at 03:01
EPOLLERR, in this context, is useful in that it basically signifies an ECONNREFUSED. All the other reasonable errors are unrelated to a stress test and aren't going to happen as long as you can actually reach the host. Hence, timing-out the connections is key. I don't know how you're using epoll, but specifying a timeout on the call itself and using the measured time between calls to determine which of your pending connections has expired after each call is one way to go. – Matthew Hall Nov 15 '12 at 03:23
Also, from the connect() man page, if you want to get the error code, make sure you're also registered for write availability, and once it or EPOLLERR are triggered, call getsockopt() on the socket's file descriptor "to read the SO_ERROR option at level SOL_SOCKET." You don't need to register for EPOLLERR, because epoll_wait always waits for this event regardless. It's not clear whether write availability is triggered at the same time for a connect() failure, but write availability is the normal way to check non-blocking connect() errors with select() and poll(). – Matthew Hall Nov 15 '12 at 03:37
Thanks, ahh, didn't know I can use getsockopt() to get the errno. yeah, I registered writability for connect(). From what I can see, failure of connect() triggers both EPOLLOUT and EPOLLERR. In terms of the timeout issue, I am using epoll in a while loop in which epoll.wait(1) is called, and then iterate all the filenos in another loop. I think I understand the way you described in the comments, but I am not sure I understand why timing-out is key? Thanks. – wei Nov 15 '12 at 05:52
hm...now I did see ETIMEOUT error while connect() is in progress. Now I am wondering how it is different than ECONNREFUSED from a client's point of view? – wei Nov 15 '12 at 19:24
I don't believe that error code is usually useful (i.e., implemented in a way that's consistent and reportable). The timeout, if it applies, is usually very, very long with TCP (think: 120 seconds), but most importantly, it can vary from implementation to implementation. Testing it empirically would also be a chore. If you simply do manual timeouts at a finer-granularity which you control (important for stress testing, to prove the server meets an SLA / specific requirements), then it will always work exactly to your spec, and it really won't be so much as a 1/100 as hard as using epoll. – Matthew Hall Nov 16 '12 at 00:04
`ETIMEOUT` means that the time from initiating your connect to the remote side's TCP stack completing its three-way handshake exceeded some system-level default timeout value. There could be a variety of reasons why. However, it does not overlap with `ECONNREFUSED`, which actually involves the remote TCP/IP stack sending back an `RST` packet in response to an attempt to connect-- generally because there exists no process bound to the requested TCP port or because adding your connection to the server process's `accept` queue would exceed the connection limit it specified with `listen`. – Matthew Hall Nov 16 '12 at 00:13
yes, that's what I have realized. ECONNREFUSED is not relevant in this testing case. thanks. – wei Nov 16 '12 at 01:07

score 1 · Answer 2 · answered Nov 15 '12 at 02:07

1

You can't know it 'failed because the server can't accept more connections', because there is no specific protocol for that condition. You can only know it failed for the usual reasons: ECONNREFUSED, connection timeout, EUNREACH, etc.

answered Nov 15 '12 at 02:07

user207421

305,947
44
307
483

How is connect() failure notified in epoll?

2 Answers2