0

I have been experimenting with signals and I am facing a problem I can not explain.

I have recreated my issue in this simple C program, in a nutshell I am reading user input in a loop using getline(). The user can fork the process, kill the child process, or exit the main process all together.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

int counter = 0;

void handler(int signum){
    counter++;
}

int main(){
    int bool = 1;
    char *input;
    size_t  size=100;
    input = malloc(sizeof(char)*100);
    memset(input,'\0',size);
    pid_t id;

    struct sigaction sa;

    do{
        printf("counter=%d\n",counter);
        getline(&input,&size,stdin);
        if( strncmp(input,"fork",4) == 0 ){

            id = fork();
            if( id == 0 ){//child
                while(1) sleep(1);
                free(input);
                return 0;
            }else if( id > 0 ){//parent
                sa.sa_handler = handler;
                sigaction(SIGCHLD, &sa, NULL);
            }else{//fork failed
                free(input); return -1;
            }

        }else if( strncmp(input,"kill",4) == 0 ){
            kill(id,9);
        }else if( strncmp(input,"exit",4) == 0 ){ 
            bool = 0;
        }
        
    }while(bool == 1);

    free(input);
    return 0;
}

The strange thing is that if I fork a child process and then kill it, in other words typing to the stdin:

fork

kill

I get stuck in an infinite loop where the following is printed to the stdout indefinitely (which is also an idication that the SIGCHLD was cached when the child was killed)

counter 1

If I remove the signal handler everything seems to be working fine. I know that getline() uses the read() syscall and the SIGCHLD signal causes it's interruption, but apart from that I am almost certain that in the next iteration the getline() function should work just fine. Does anyone have an explanation why getline() stops working?

(I am using the gcc compiler and executing the program on Ubuntu 20.04 LTS)

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Martian
  • 94
  • 1
  • 9
  • 1
    regarding: `char *input;` and `getline(&input,&size,stdin);` the call to `getline()` will use what ever trash happens to be on the stack at the address of `input` as the first parameter. (this might be masked by your IDE by setting all the stack to 0x00, however you cannot depend on that behavior.) strongly suggest replacing the declaration of `input` with: `char *input = NULL;` – user3629249 May 11 '21 at 14:36
  • You are right on that, thank you for pointing it out. I will edit the post and add a meset() call in order to avoid any confusion, even though the solution you mentioned is absolutely correct, since getline will alloc any memory needed if the input argument is set to NULL. – Martian May 12 '21 at 07:49

2 Answers2

3

The reason is when read() syscall is interrupted (when the parent process receives SIGCHLD, read() fails with EINTR), the stream is set to error state. This is as documented in POSIX's getline:

If an error occurs, the error indicator for the stream shall be set, and the function shall return -1 and set errno to indicate the error.

If the signal was delivered to the parent before entering the read() system call, then it would be handled before system call and thus there's no EINTR on read(). That's why you may not always see the infinite loop on getline() call.

but apart from that I am almost certain that in the next iteration the getline() function should work just fine.

Once a stream is set to error, it's not automatically cleared next time. So you have to clear it yourself with clearerr.

Note that this behaviour happens because of the requirement of getline; doesn't come from the interrupted system call read(). If you were to use read() directly on file descriptor STDIN_FILENO in a loop, it'll work as expected in the next iteration as you expected i.e. no infinite loop.

Alternatively, you could tell system calls to be restarted automatically with SA_RESTART flag:

sa.sa_flags = SA_RESTART;

In that case, EINTR is transparently handled and read() is restarted automatically after handling the signal and is never conveyed to getline() function.


P.S.: you should initialize sa with:

struct sigaction sa = {0};

and empty initialise the signal set with sigemptyset:

sigemptyset(&sa.sa_mask);

because you're only setting the sa_handler and rest of the fields are left uninitialized!

P.P
  • 117,907
  • 20
  • 175
  • 238
  • You went in great depth in order to answer to my question, I am really great full for that. I also got an better understanding of why the problem occurs. About the SA_RESTART solution, I had it in mind but latter in my program I have to interrupt a read() call from a file, so for my rare scenario it cannot help. Clearerr() seems the better option for me, but in general the SA_RESTART might be a better practice. Also thank you for pointing out the need for initialization of the sa, I had forgot about it. – Martian May 10 '21 at 21:35
  • Zero-initializing a `struct sigaction` leaves the `sa_mask` field with an unspecified value. You need to do `sigemptyset(&sa.sa_mask)` (and then add any signals that need blocking). – zwol May 27 '21 at 15:56
  • @zwol Indeed. Updated the anwser. – P.P Jun 12 '21 at 09:13
1

On onlinegdb.com I could not always reproduce the problem. Sometimes it seems to work as expected, sometimes I get repeated errors reported by getline.

By setting errno = 0 before calling getline and checking both the return value of getline and errno afterwards, I found out that getline repeatedly returns -1. On the first call it sets errno = EINTR (perror reports "Interrupted system call") on the subsequent calls, errno remains 0 ("Success").

    /* ... */
    do{
        printf("counter=%d\n",counter);
        errno = 0;
        if(getline(&input,&size,stdin) < 0)
        {
            static int i = 20; // to avoid endless loop
            perror("getline");
            if(--i == 0) return 1;
        }
    /* ... */

Apparently, in some/many cases the signal sets a permanent error condition of the input stream stdin.

The permanent error can be cleared by calling clearrerr.

Unfortunately I did not (yet) find a documentation that explains this behavior.

    /* ... */
    do{
        printf("counter=%d\n",counter);
        errno = 0;
        if(getline(&input,&size,stdin) < 0)
        {
            perror("getline");
            if(errno == EINTR)
            {
                //clearerr(stdin); // clearing here would avoid the 2nd error return
            }
            else if(errno == 0)
            {
                clearerr(stdin);
            }
            else
            {
                return 2;
            }
        }
    /* ... */
Bodo
  • 9,287
  • 1
  • 13
  • 29
  • First of all thank you for your answer. Now I can totally see why an error stays in the stdin, and the use of the clearerr() function is the best solution I have come across so far. I am really great full for your help. – Martian May 10 '21 at 21:26