8

I am currently using select loop to manage sockets in a proxy. One of the requirements of this proxy is that if the proxy sends a message to the outside server and does not get a response in a certain time, the proxy should close that socket and try to connect to a secondary server. The closing happens in a separate thread, while the select thread blocks waiting for activity.

I am having trouble figuring out how to detect that this socket closed specifically, so that I can handle the failure. If I call close() in the other thread, I get an EBADF, but I can't tell which socket closed. I tried to detect the socket through the exception fdset, thinking it would contain the closed socket, but I'm not getting anything returned there. I have also heard calling shutdown() will send a FIN to the server and receive a FIN back, so that I can close it; but the whole point is me trying to close this as a result of not getting a response within the timeout period, so I cant do that, either.

If my assumptions here are wrong, let me know. Any ideas would be appreciated.

EDIT: In response to the suggestions about using select time out: I need to do the closing asynchronously, because the client connecting to the proxy will time out and I can't wait around for the select to be polled. This would only work if I made the select time out very small, which would then constantly be polling and wasting resources which I don't want.

Newton Falls
  • 2,148
  • 3
  • 17
  • 22
Doldrim
  • 485
  • 2
  • 5
  • 10
  • 3
    See also http://stackoverflow.com/questions/543541/what-does-select2-do-if-you-close2-a-file-descriptor-in-a-separate-thread – mark4o Aug 25 '09 at 20:22
  • Possible duplicate of [breaking out from socket select](http://stackoverflow.com/questions/2486727/breaking-out-from-socket-select) – iammilind May 04 '16 at 13:22

7 Answers7

9

Generally I just mark the socket for closing in the other thread, and then when select() returns from activity or timeout, I run a cleanup pass and close out all dead connections and update the fd_set. Doing it any other way opens you up to race conditions where you gave up on the connection, just as select() finally recognized some data for it, then you close it, but the other thread tries to process the data that was detected and gets upset to find the connection closed.

Oh, and poll() is generally better than select() in terms of not having to copy as much data around.

woolstar
  • 5,063
  • 20
  • 31
  • 3
    Agreed-- perform the close() in the select() thread. If you need to have another thread detect the timeout, send the select() thread a message through a pipe (which you can put in the select set) instead of calling close() directly. – mark4o Aug 25 '09 at 21:27
3

You cannot free a resource in one thread while another thread is or might be using it. Calling close on a socket that might be in use in another thread will never work right. There will always be potentially disastrous race conditions.

There are two good solutions to your problem:

  1. Have the thread that calls select always use a timeout no greater than the longest you're willing to wait to process a timeout. When a timeout occurs, indicate that some place the thread that calls select will notice when it returns from select. Have that thread do the actual close of the socket in-between calls to select.

  2. Have the thread that detects the timeout call shutdown on the socket. This will cause select to return and then have that thread do the close.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
1

How to cope with EBADF on select():

int fopts = 0;
for (int i = 0; i < num_clients; ++i) {
    if (fcntl(client[i].fd, F_GETFL, &fopts) < 0) {
        // call close(), FD_CLR(), and remove i'th element from client list
    }
}

This code assumes you have an array of client structures which have "fd" members for the socket descriptor. The fcntl() call checks whether the socket is still "alive", and if not, we do what we have to to remove the dead socket and its associated client info.

Warren Young
  • 40,875
  • 8
  • 85
  • 101
  • This method includes a race condition. A socket can be closed and its fd be reused in a different way (e.g. server connection). The code may still process it as if it was a client connection. – Helmut Grohne Feb 10 '12 at 12:33
0

It's hard to comment when seeing only a small part of the elephant but maybe you are over complicating things?

Presumably you have some structure to keep track of each socket and its info (like time left to receive a reply). You can change the select() loop to use a timeout. Within it check whether it is time to close the socket. Do what you need to do for the close and don't add it to the fd sets the next time around.

Duck
  • 26,924
  • 5
  • 64
  • 92
0

If you use poll(2) as suggested in other answers, you can use the POLLNVAL status, which is essentially EBADF, but on a per-file-descriptor basis, not on the whole system call as it is for select(2).

camh
  • 40,988
  • 13
  • 62
  • 70
-1

Use a timeout for the select, and if the read-ready/write-ready/had-error sequences are all empty (w.r.t that socket), check if it was closed.

shikhar
  • 2,432
  • 1
  • 19
  • 14
-1

Just run a "test select" on every single socket that might have closed with a zero timeout and check the select result and errno until you found the one that has closed.

The following piece of demo code starts two server sockets on separate threads and creates two client sockets to connect to either server socket. Then it starts another thread, that will randomly kill one of the client sockets after 10 seconds (it will just close it). Closing either client socket causes select to fail with error in the main thread and the code below will now test which of the two sockets has actually closed.

#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdint.h>
#include <pthread.h>
#include <stdbool.h>
#include <arpa/inet.h>
#include <netinet/in.h>
#include <sys/select.h>
#include <sys/socket.h>


static void * serverThread ( void * threadArg )
{
    int res;
    int connSo;
    int servSo;
    socklen_t addrLen;
    struct sockaddr_in soAddr;
    uint16_t * port = threadArg;

    servSo = socket(PF_INET, SOCK_STREAM, 0);
    assert(servSo >= 0);

    memset(&soAddr, 0, sizeof(soAddr));
    soAddr.sin_family = AF_INET;
    soAddr.sin_port = htons(*port);

    // Uncommend line below if your system offers this field in the struct
    // and also needs this field to be initialized correctly.
//  soAddr.sin_len = sizeof(soAddr);

    res = bind(servSo, (struct sockaddr *)&soAddr, sizeof(soAddr));
    assert(res == 0);

    res = listen(servSo, 10);
    assert(res == 0);

    addrLen = 0;
    connSo = accept(servSo, NULL, &addrLen);
    assert(connSo >= 0);

    for (;;) {
        char buffer[2048];
        ssize_t bytesRead;

        bytesRead = recv(connSo, buffer, sizeof(buffer), 0);
        if (bytesRead <= 0) break;

        printf("Received %zu bytes on port %d.\n", bytesRead, (int)*port);
    }
    free(port);
    close(connSo);
    close(servSo);
    return NULL;
}

static void * killSocketIn10Seconds ( void * threadArg )
{
    int * so = threadArg;

    sleep(10);
    printf("Killing socket %d.\n", *so);
    close(*so);
    free(so);
    return NULL;
}


int main ( int argc, const char * const * argv )
{
    int res;
    int clientSo1;
    int clientSo2;
    int * socketArg;
    uint16_t * portArg;
    pthread_t killThread;
    pthread_t serverThread1;
    pthread_t serverThread2;
    struct sockaddr_in soAddr;

    // Create a server socket at port 19500
    portArg = malloc(sizeof(*portArg));
    assert(portArg != NULL);
    *portArg = 19500;
    res = pthread_create(&serverThread1, NULL, &serverThread, portArg);
    assert(res == 0);

    // Create another server socket at port 19501
    portArg = malloc(sizeof(*portArg));
    assert(portArg != NULL);

    *portArg = 19501;
    res = pthread_create(&serverThread1, NULL, &serverThread, portArg);
    assert(res == 0);

    // Create two client sockets, one for 19500 and one for 19501
    // and connect both to the server sockets we created above.

    clientSo1 = socket(PF_INET, SOCK_STREAM, 0);
    assert(clientSo1 >= 0);

    clientSo2 = socket(PF_INET, SOCK_STREAM, 0);
    assert(clientSo2 >= 0);

    memset(&soAddr, 0, sizeof(soAddr));
    soAddr.sin_family = AF_INET;
    soAddr.sin_port = htons(19500);
    res = inet_pton(AF_INET, "127.0.0.1", &soAddr.sin_addr);
    assert(res == 1);

    // Uncommend line below if your system offers this field in the struct
    // and also needs this field to be initialized correctly.
//  soAddr.sin_len = sizeof(soAddr);

    res = connect(clientSo1, (struct sockaddr *)&soAddr, sizeof(soAddr));
    assert(res == 0);

    soAddr.sin_port = htons(19501);
    res = connect(clientSo2, (struct sockaddr *)&soAddr, sizeof(soAddr));
    assert(res == 0);

    // We want either client socket to be closed locally after 10 seconds.
    // Which one is random, so try running test app multiple times.
    socketArg = malloc(sizeof(*socketArg));
    srandomdev();
    *socketArg = (random() % 2 == 0 ? clientSo1 : clientSo2);
    res = pthread_create(&killThread, NULL, &killSocketIn10Seconds, socketArg);
    assert(res == 0);

    for (;;) {
        int ndfs;
        int count;
        fd_set readSet;

        // ndfs must be the highest socket number + 1
        ndfs = (clientSo2 > clientSo1 ? clientSo2 : clientSo1);
        ndfs++;

        FD_ZERO(&readSet);
        FD_SET(clientSo1, &readSet);
        FD_SET(clientSo2, &readSet);

        // No timeout, that means select may block forever here.
        count = select(ndfs, &readSet, NULL, NULL, NULL);

        // Without a timeout count should never be zero.
        // Zero is only returned if select ran into the timeout.
        assert(count != 0);

        if (count < 0) {
            int error = errno;

            printf("Select terminated with error: %s\n", strerror(error));

            if (error == EBADF) {
                fd_set closeSet;
                struct timeval atonce;

                FD_ZERO(&closeSet);
                FD_SET(clientSo1, &closeSet);
                memset(&atonce, 0, sizeof(atonce));
                count = select(clientSo1 + 1, &closeSet, NULL, NULL, &atonce);
                if (count == -1 && errno == EBADF) {
                    printf("Socket 1 (%d) closed.\n", clientSo1);
                    break; // Terminate test app
                }

                FD_ZERO(&closeSet);
                FD_SET(clientSo2, &closeSet);
                // Note: Standard requires you to re-init timeout for every
                // select call, you must never rely that select has not changed
                // its value in any way, not even if its all zero.
                memset(&atonce, 0, sizeof(atonce));
                count = select(clientSo2 + 1, &closeSet, NULL, NULL, &atonce);
                if (count == -1 && errno == EBADF) {
                    printf("Socket 2 (%d) closed.\n", clientSo2);
                    break; // Terminate test app
                }
            }
        }
    }
    // Be a good citizen, close all sockets, join all threads
    close(clientSo1);
    close(clientSo2);
    pthread_join(killThread, NULL);
    pthread_join(serverThread1, NULL);
    pthread_join(serverThread2, NULL);

    return EXIT_SUCCESS;
}

Sample output for running this test code twice:

$ ./sockclose 
Killing socket 3.
Select terminated with error: Bad file descriptor
Socket 1 (3) closed.

$  ./sockclose 
Killing socket 4.
Select terminated with error: Bad file descriptor
Socket 1 (4) closed.

However, if your system supports poll(), I would strongly advise you to consider using this API instead of select(). Select is a rather ugly, legacy API from the past, only left there for backward compatibility with existing code. Poll has a much better interface for this task and it has an extra flag to directly signal you that a socket has closed locally: POLLNVAL will be set on revents if this socket has been closed, regardless which flags you requested on events, since POLLNVAL is an output only flags, that means it is ignored when being set on events. If the socket was not closed locally but the remote server has just closed the connection, the flag POLLHUP will be set in revents (also an output only flag). Another advantage of poll is that the timeout is simply an int value (milliseconds, fine grained enough for real network sockets) and that there are no limitations to the number of sockets that can be monitored or their numeric value range.

Mecki
  • 125,244
  • 33
  • 244
  • 253
  • This won't work. There's no way to guarantee that the socket will still be closed. For example, another thread might call `socket` after the `close` but before you call `select` and get the same descriptor. – David Schwartz Jul 05 '18 at 01:44
  • @DavidSchwartz "For example, another thread might call socket...", then don't make another thread do that. It's your app, you control who can create sockets, where, and when. – Mecki Jul 05 '18 at 09:36
  • The reason I wrote "For example" is because that's just one way it can go wrong. There are other ways. For example, a new version of a library that you've been using can now create a background thread that calls `socket` and suddenly your code breaks when someone upgrades that library. – David Schwartz Jul 05 '18 at 23:23