1

I have an application that runs on a large number of processors. On processor 0, I have a function that writes data to a socket if it is open. This function runs in a loop in a separate thread on processor 0, i.e. processor 0 is responsible for its own workload and has an extra thread running the communication on the socket.

//This function runs on a loop, called every 1.5 seconds
void T_main_loop(const int& client_socket_id, bool* exit_flag)
{
    //Check that socket still connected.
    int error_code;
    socklen_t error_code_size = sizeof(error_code);
    getsockopt(client_socket_id, SOL_SOCKET, SO_ERROR, &error_code, &error_code_size);

    if (error_code == 0)
    {
        //send some data
        int valsend = send(client_socket_id , data , size_of_data , 0);
    }
    else
    {
        *(exit_flag) = false; //This is used for some external logic.
        //Can I fix the broklen pipe here somehow?
    }
}

When the client socket is closed, the program should just ignore the error, and this is standard behavior as far as I am aware.

However, I am using an external library (PETSc) that is somehow detecting the broken pipe error and closing the entire parallel (MPI) environment:

[0]PETSC ERROR: Caught signal number 13 Broken Pipe: Likely while reading or writing to a socket

I would like to leave the configuration of this library completely untouched if at all possible. Open to any robust workarounds that are possible.

eyllanesc
  • 235,170
  • 19
  • 170
  • 241
wvn
  • 624
  • 5
  • 12
  • 2
    You cannot force a broken connection back into a working state, no. – Jesper Juhl Jan 14 '20 at 18:20
  • Why are you using `SO_ERROR` to check the connection state? You should not be doing that at all. Just call `send()` unconditionally, and if it fails then `errno` will tell you why, for instance `EBADF`/`ENOTSOCK`, `ECONNRESET`, `ENOTCONN`, `EPIPE`, etc. Use that instead to detect a broken connection. – Remy Lebeau Jan 15 '20 at 21:20
  • @RemyLebeau The application is designed to close the connection on any error and then reopen a new connection on the same port immediately thereafter. The implementation is simple enough that the actual error code doesn't matter. I am not sure it should't be done this way, is there any way you could elaborate? – wvn Jan 16 '20 at 11:15
  • @wvn `void T_main_loop(const int& client_socket_id, bool* exit_flag) { int valsend = send(client_socket_id, data, size_of_data, MSG_NOSIGNAL); if (valsend < 0) { /* check errno if needed */ *exit_flag = false; } }` – Remy Lebeau Jan 16 '20 at 15:38
  • @RemyLebeau That works of course, but what is the advantage other than being more concise? – wvn Jan 16 '20 at 16:17
  • 1
    @wvn with `SO_ERROR`, you don't know what previous operation failed, or when it failed. Nor do you need to know. With this approach, you want to send data NOW, and it fails NOW, so act on it, regardless of what happened BEFORE. – Remy Lebeau Jan 16 '20 at 19:25

1 Answers1

2

By default, the OS sends the thread SIGPIPE if it tries to write into a (half) closed pipe or socket.

One option to disable the signal is to do signal(SIGPIPE, SIG_IGN);.

Another option is to use MSG_NOSIGNAL flag for send, e.g. send(..., MSG_NOSIGNAL);.

Maxim Egorushkin
  • 131,725
  • 17
  • 180
  • 271