1

I have been reading "The Linux Programming Interface". Chapter 27, Program execution.

I understand that the author demonstrates how we could implement the system call using exec and fork. However, the challenging part is handling signals. In particular I am confused with the following text

The first signal to consider is SIGCHLD. Suppose that the program calling system() is also directly creating children, and has established a handler for SIGCHLD that performs its own wait(). In this situation, when a SIGCHLD signal is generated by the termination of the child created by system(), it is possible that the signal handler of the main program will be invoked and collect the child’s status before system() has a chance to call waitpid(). (This is an example of a race condition.)

The following is the code example without signal handling

#include <unistd.h>
#include <sys/wait.h>
#include <sys/types.h>

int system(char *command)
{
  int status;
  pid_t childPid;

  switch(childPid = fork())
  {
    case -1: /* Error */
      return -1;

    case 0: /* Child */
      execl("/bin/sh", "sh", "-c", command, (char*) NULL);
      _exit(127); /* If reached this line than execl failed*/

    default: /* Parent */
      if (waitpid(childPid), &status, 0) == -1)
        return -1;
      else
        return status;
  }
}

I know what the race condition ism but don't understand the whole scenario the author describes. In particular, I don't understand what "the program calling system" might be. What is the "main program"? Which process creates child procs?

Could someone, please, explain by giving examples how a race condition can arise? In C or in pseudocode.

storm
  • 795
  • 1
  • 5
  • 12
  • It's all just talking about the parent program. We don't care about what happens inside the children (either directly-created or started by our `system` implementation) – Useless Dec 06 '19 at 13:33
  • @Useless What is "the program calling system" in terms of code I provide? What is the "main program"? – storm Dec 06 '19 at 14:13
  • The snippet you show seems incomplete. `system()` never is called. – alk Dec 06 '19 at 14:15
  • @alk I edited the code and now it is exactly like in the book. There is no example where this `system()` is called. As far as I understand this piece of code is a sort of separate program which is called from another program/process referred as 'the program calling system()'. But I am not sure. Then what is "the main program"? – storm Dec 06 '19 at 14:25
  • Do you know what a program is in general? In C, it's something with a `main` entrypoint which is compiled to an executable. It can also refer to the process which is executing that ... executable. I don't know how that's ambiguous once you know we're talking about the parent, and not the children created by calling `fork()`. – Useless Dec 06 '19 at 15:42
  • @Useless Yes, I do know what a program is. Also, this is from the book: "A program is a file containing a range of information that describes how to construct a process at run time." But knowing it does not help me resolve the ambiguity. – storm Dec 06 '19 at 17:02

1 Answers1

2

You could have a SIGCHLD handler installed that does int ws; wait(&ws);.

If such a SIGCHLD handler is allowed to run in response to a SIGCHLD, it will race with the waitpid done in system, preventing system from successfully retrieving the exit status of the child if the handler wins the race.

For this reason, POSIX prescribes that SIGCHLD be blocked in system.

You could still have races with wait calls done in other signal handlers or other threads, but that would be a design error that POSIX system won't help you with.

#include <unistd.h>
#include <sys/wait.h>
#include <sys/types.h>
#include <errno.h>
#include <stdio.h>

int system(char *command)
{
  int status;
  pid_t childPid;

  switch(childPid = fork())
  {
    case -1: /* Error */
      return -1;

    case 0: /* Child */
      execl("/bin/sh", "sh", "-c", command, (char*) NULL);
      _exit(127); /* If reached this line than execl failed*/

    default: /* Parent */
      /*usleep(1);*/
      if (waitpid(childPid, &status, 0) == -1)
        return -1;
      else
        return status;
  }
}
void sigchld(int Sig){ int er=errno; wait(0); errno=er; }
int main()
{
    /*main program*/

    //main program has a sigchld handler
    struct sigaction sa; 
    sa.sa_flags = 0;
    sigemptyset(&sa.sa_mask);
    sa.sa_handler = sigchld;
    sigaction(SIGCHLD, &sa,0);

    for(;;){
        //the handler may occasionally steal the child status
        if(0>system("true") && errno==ECHILD)
            puts("Child status stolen!");

    }

}
Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
  • The child does execl() replacing itself with a new shell process. So, does that mean that both the handler and the system function wait for the shell process? – storm Dec 06 '19 at 18:56
  • I don't see any race in this situation. If, say, the handler first collects the child process while the system sleeps, then the child process terminates by the time the system waitpid on the childPid and so waitpid returns immediately. No, infinite wait, except that the system can't get the return status. – storm Dec 06 '19 at 19:09
  • 1
    @HardFork Yes, it's a simple race for who gets the exit status of the child first -- the system function or the handler. – Petr Skocik Dec 06 '19 at 23:04
  • Signal blocking is thread-local (at least on Linux). Won't the SIGCHLD just be delivered to another thread if it's blocked? – dyp May 16 '23 at 10:43
  • 2
    @dyp Yes. On all POSIX platforms. `system()` doesn't play super nice with multithreading. `system()` is also require to ignore `SIGINT` and `SIGQUIT` for the duration of the child's life and that can be even worse for multithreading as these signal dispositions are shared among all threads and the changes happen without any inter-thread coordination. – Petr Skocik May 16 '23 at 10:54