1

I am facing some trouble dealing with zombie processes. I wrote a simple server which creates tic tac toe matches between players. I am using select() to multiplex between multiple connected clients. Whenever there are two clients, the server will fork another process which execs a match arbiter program.

The problem is that select() blocks. So therefore, say if there is a match arbiter program running as a child process and it exits, the parent will never wait for the child if there are no incoming connections because select() is blocking.

I have my code here, apologies since it is quite messy.

while(1) {
    if (terminate)
        terminate_program();
    FD_ZERO(&rset);
    FD_SET(tcp_listenfd, &rset);
    FD_SET(udpfd, &rset);
    maxfd = max(tcp_listenfd, udpfd);

    /* add child connections to set */
    for (i = 0; i < MAXCLIENTS; i++) {
        sd = tcp_confd_lst[i];
        if (sd > 0)
            FD_SET(sd, &rset);
        if (sd > maxfd)
            maxfd = sd;
    }

    /* Here select blocks */
    if ((nready = select(maxfd + 1, &rset, NULL, NULL, NULL)) < 0) {
        if (errno == EINTR)
            continue;
        else
            perror("select error");
    }

    /* Handles incoming TCP connections */
    if (FD_ISSET(tcp_listenfd, &rset)) {
        len = sizeof(cliaddr);
        if ((new_confd = accept(tcp_listenfd, (struct sockaddr *) &cliaddr, &len)) < 0) {
            perror("accept");
            exit(1);
        }
        /* Send connection message asking for handle */
        writen(new_confd, handle_msg, strlen(handle_msg));
        /* adds new_confd to array of connected fd's */
        for (i = 0; i < MAXCLIENTS; i++) {
            if (tcp_confd_lst[i] == 0) {
                tcp_confd_lst[i] = new_confd;
                break;
            }
        }
    }

    /* Handles incoming UDP connections */
    if (FD_ISSET(udpfd, &rset)) {

    }

    /* Handles receiving client handles */
    /* If client disconnects without entering their handle, their values in the arrays will be set to 0 and can be reused. */
    for (i = 0; i < MAXCLIENTS; i++) {
        sd = tcp_confd_lst[i];
        if (FD_ISSET(sd, &rset)) {
            if ((valread = read(sd, confd_handle, MAXHANDLESZ)) == 0) {
                printf("Someone disconnected: %s\n", usr_handles[i]);
                close(sd);
                tcp_confd_lst[i] = 0;
                usr_in_game[i] = 0;
            } else {
                confd_handle[valread] = '\0';
                printf("%s\n", confd_handle); /* For testing */
                fflush(stdout);
                strncpy(usr_handles[i], confd_handle, sizeof(usr_handles[i]));
                for (j = i - 1; j >= 0; j--) {
                    if (tcp_confd_lst[j] != 0 && usr_in_game[j] == 0) { 
                        usr_in_game[i] = 1; usr_in_game[j] = 1;
                        if ((child_pid = fork()) == 0) {
                            close(tcp_listenfd);
                            snprintf(fd_args[0], sizeof(fd_args[0]), "%d", tcp_confd_lst[i]);
                            snprintf(fd_args[1], sizeof(fd_args[1]), "%d", tcp_confd_lst[j]);
                            execl("nim_match_server", "nim_match_server", usr_handles[i], fd_args[0], usr_handles[j], fd_args[1], (char *) 0);
                        }
                        close(tcp_confd_lst[i]); close(tcp_confd_lst[j]);
                        tcp_confd_lst[i] = 0; tcp_confd_lst[j] = 0;
                        usr_in_game[i] = 0; usr_in_game[j] = 0;
                    }
                }
            }
        }
    }
}

Is there a method which allows wait to run even when select() is blocking? Preferably without signal handling since they are asynchronous.

EDIT: Actually, I found out that select has a timeval data structure which we can specify the timeout. Would using that be a good idea?

mrQWERTY
  • 4,039
  • 13
  • 43
  • 91
  • use the timeout parameter (the last parameter) on the select() statement. Then the next instruction after select should be checking of a timeout occurred. One method of checking for timeout is to check if the input fd_set is all zeros (or checking that the target fd entries are zero – user3629249 Mar 25 '15 at 05:21
  • No! You should not just use a timeout parameter and check that timeout occurred. If timeout is 2 seconds and the program receives a chunk of data every second, then the timeout never happens. You should run the timed code in all cases when you returned from select, not just in timeout cases. If you want the code to run every second, you could store its last execution time and then check if cur_time() - last_exec_time is larger than one second. – juhist Mar 25 '15 at 09:04

2 Answers2

3

I think your options are:

  1. Save all your child descriptors in a global array and call wait() from a signal handler. If you don't need the exit status of your children in your main loop, I think this is the easiest.

  2. Instead of select, use pselect -- it will return upon receiving a specified (set of) signal(s), in your case, SIGCHLD. Then call wait/WNOHANG on all child PIDs. You will need to block/unblock SIGCHLD at the right moments before/after pselect(), see here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html

  3. Wait on/cleanup child PIDs from a secondary thread. I think this is the most complicated solution (re. synchronization between threads), but since you asked, it's technically possible.

davlet
  • 527
  • 3
  • 12
  • Could method 2 be used in the case where the parent process passed a socket to the child process, and is intending to accept a connection from the child process, while at the same time handling the possibility of the child process exiting due to an error prior to connecting to the socket? This would allow me to handle both child process failure, and child process success with pending socket connection. – CMCDragonkai Jan 21 '17 at 13:33
2

If you just want to prevent zombie processes, you could set up a SIGCHLD signal handler. If you want to actually wait for the return status, you could write bytes into a pipe (non-blocking, just in case) from the signal handler and then read those bytes in the select loop.

For how to handle SIGCHLD, see http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html -- you want to do something like while (waitpid((pid_t)(-1), 0, WNOHANG) > 0) {}

Perhaps the best approach is sending a single byte from the SIGCHLD signal handler to the main select loop (non-blocking, just in case) and doing the waitpid loop in the select loop when bytes can be read from the pipe.

You could also use a signalfd file descriptor to read the SIGCHLD signal, although that works only on Linux.

juhist
  • 4,210
  • 16
  • 33