2

I'm programming a TCP server, one that I desire to accept a single connection at a time, and by reusing the address and port it uses for listening. The first connection to a started instance of the server (e.g. via netcat) always succeeds, but subsequent connection attempts halt at accept() not returning a socket descriptor. I've experimented with different queue lengths, and with connecting while the previous connection is in TIME_WAIT state, and also after it's been cleared, but the result is the same. Both netcat and netstat report that the new connection attempt is successful, and report that the connection is established (regardless whether the previous connection is in TIME_WAIT or expired), but my server is stuck at the accept() call, thus it doesn't register the new connection. This behaviour doesn't always happen immediately at the first subsequent connection attempt, but pretty much always during the first three attempts.

The code:


main() {
    Socket socket(10669);
    
    while (true) {
        socket.establish_connection();
        
        socket.receive(callback);
        socket.close_connection();
    }
}



void Socket::establish_connection() {
    // Creating socket file descriptor
    int server_fd = 0;
    if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) == 0) {
        throw ...;
    }

    // Setting socket options
    int socket_options = 1;
    if (setsockopt(server_fd, SOL_SOCKET, SO_REUSEPORT, &socket_options, sizeof(socket_options))) {
        throw ...;
    }

    struct sockaddr_in address;
    address.sin_family = AF_INET;
    address.sin_addr.s_addr = INADDR_ANY;
    address.sin_port = htons(port);

    if (bind(server_fd, (sockaddr *) &address, sizeof(address)) < 0) {
        throw ...;
    }

    if (listen(server_fd, 1) < 0) {
        throw ...;
    }

    spdlog::info("Listening for clients on port {}", port);

    // this is where it blocks at repeated connection attempts
    struct sockaddr_in client_address;
    int addrlen = sizeof(client_address);
    if ((socket = accept(server_fd, (sockaddr *) &client_address,  (socklen_t*) &addrlen)) < 0) {
        throw ...;
    }

    spdlog::info("Client connected\n");
}


void Socket::receive(SocketCallback callback) {
    while (true) {
        fd_set read_socket_fd;
        FD_ZERO(&read_socket_fd);
        FD_SET(socket, &read_socket_fd);

        int sel = select(socket+1, &read_socket_fd, NULL, NULL, NULL);

        if (sel > 0) {
            // receiving data, no problems here
        }
    }
}


void Socket::close_connection() {
    close(socket);
}

Some printouts from the server, and netstat:

On startup (server):

[2020-07-07 13:33:53.387] [info] Socket initialised to use port 10669
[2020-07-07 13:33:53.387] [info] Listening for clients on port 10669

On startup (netstat):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN

On first connection (server):

[2020-07-07 13:34:35.481] [info] Client connected

On first connection (netstat):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:54860         localhost:10669         ESTABLISHED
tcp        0      0 localhost:10669         localhost:54860         ESTABLISHED

On first disconnect from the client (server):

[2020-07-07 13:35:47.903] [warning] Client disconnected
[2020-07-07 13:35:47.903] [info] Listening for clients on port 10669

On first disconnect from the client (netstat):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:54860         localhost:10669         TIME_WAIT

On second connection attempt the server reports nothing, as it is stuck on the "listening for clients..." line, indicating being blocked at accept(). This is what netstat reports (this is when I connected immediately after the first disconnect, so while the previous connection was in TIME_WAIT state):

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        1      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:54968         localhost:10669         TIME_WAIT
tcp        0      0 localhost:54970         localhost:10669         ESTABLISHED
tcp        0      0 localhost:10669         localhost:54970         ESTABLISHED

The same happens when I finish waiting for TIME_WAIT to expire and only then try to connect:

tcp        0      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        1      0 0.0.0.0:10669           0.0.0.0:*               LISTEN
tcp        0      0 localhost:10669         localhost:55134         ESTABLISHED
tcp        0      0 localhost:55134         localhost:10669         ESTABLISHED

In both cases the connection is active in netcat, I can freely type, but of course nothing is being received; there are no other processes that could intercept the connection.

I know that I might try the non-blocking accept(), but this blocking behaviour of accept() fits my usage perfectly, when it behaves as intended, so the question is - why would it block on reconnects, what am I missing here?

lepartit
  • 23
  • 2
  • You have to build connection, up to build in a separate function and call it only once outside the while(true) loop. Then reuse the socket inside the loop to accept connections. Listen and accept should be called inside the while loop. A good practice is to start a thread for each new connection, but not a problem, can be done in a single thread. – armagedescu Jul 07 '20 at 14:42

1 Answers1

2

You are supposed to create one server socket and then call accept repeatedly on the same socket. You seem to be creating a new server socket every time you call accept, and leaving the old ones open.

Normally, this is invalid, but you used SO_REUSEPORT to tell the operating system that you really want it. With SO_REUSEPORT, incoming connections are balanced across all the server sockets on the same port. Apparently, the operating system chose to send your new connection to the first socket, and then you tried to accept it from the second one, where there wasn't a new connection waiting.

To fix it, create a server socket once and then always accept from that same socket.

user253751
  • 57,427
  • 7
  • 48
  • 90
  • You're right, I did not notice this at all; now it works as intended. – lepartit Jul 07 '20 at 15:27
  • @user253751 Could you please elaborate on why you are "supposed to create one server socket"? Isn't `close(socket);` actually supposed to terminate said server? If not, is there no other way to actually terminate the socket server completely to be able to create a new one somewhere else? I am asking all of this because I think I ran into the exact same issue, but in an object oriented setup it is not convenient to have "only one socket server", I would like to be able to clean it up completely instead and freely recreate it later if needed. – Flo Jul 12 '23 at 09:30
  • @Flo there's a separate quirk when you "terminate a server" and create another one on the same port, because there is a risk of connections meant for the first server getting associated with the second one instead, the OS makes you wait for a few minutes. You can disable the wait with SO_REUSEADDR (not REUSEPORT) – user253751 Jul 12 '23 at 14:15
  • @user253751 Yes I am using SO_REUSEADDR on my server connections, but what appears to be happening to me is that, after closing the first server socket with `shutdown(srv_socket, SHUT_RDWR)`, starting a second one (hanging on accept) and reconnecting the client socket, the client somehow ends up connecting to the first server socket... Which makes zero sense to me... So I was wondering if maybe I missed some other socket function to call in order to completely terminate that first server socket :) – Flo Jul 12 '23 at 14:28
  • For clarity: I am using both SO_REUSEADDR and SO_REUSEPORT on the server socket. – Flo Jul 12 '23 at 14:50
  • @Flo you missed closing the first server socket with `close`. `shutdown` is for sockets that are actual connections, and it doesn't fully close them anyway. – user253751 Jul 12 '23 at 18:20
  • @user253751 I actually tried adding `close(srv_socket)` after the `shutdown` and I am getting the exact same results... I am aslo wondering why the LISTEN line in netstat is still here after closing the server, but this is a different question which is being discussed over there: https://stackoverflow.com/questions/33332489/socket-remaining-in-listen-state-after-closing/76677422#76677422 – Flo Jul 13 '23 at 08:41
  • @Flo Did you fork any processes? – user253751 Jul 13 '23 at 19:14
  • @user253751 I did not, I was working with multiple threads in a single process. I also figured out the issue in my case, the solution is in the other thread I linked above. – Flo Aug 16 '23 at 09:53