Pclose seems to make process fail

Question

This question is a follow up of this question : Controlling a C daemon from another program

My goal is to control daemon process execution from another program.
The daemon's code is really simple.

int main()
{
  printf("Daemon starting ...\n");
  openlog("daemon-test", LOG_PID, LOG_DAEMON);

  syslog(LOG_INFO, "Daemon started !\n");

  while(1)
  {
    syslog(LOG_INFO, "Daemon alive - pid=%d, pgid=%d\n", getpid(), getpgrp());
    sleep(1);
  }

  return EXIT_SUCCESS;
}

I have implemented a SystemV init script for this daemon as follow

#!/bin/sh

NAME=daemon-test
DAEMON=/usr/bin/${NAME}
SCRIPTNAME=/etc/init.d/${NAME}
USER=root
RUN_LEVEL=99
PID_FILE=/var/run/${NAME}.pid
RETRY=3

start_daemon()
{
    start-stop-daemon --start --background --name ${NAME} --chuid ${USER} --nicelevel ${RUN_LEVEL} --make-pidfile --pidfile ${PID_FILE} --exec ${DAEMON}
    ret=$?

    if [ "$ret" -eq 0 ]; then
        echo "'${NAME}' started"
    elif [ "$ret" -eq 1 ]; then
        echo "'${NAME}' is already running"
    else
        echo "An error occured starting '${NAME}'"
    fi
    return ${ret}
}

stop_daemon()
{
    start-stop-daemon --stop --retry ${RETRY} --remove-pidfile --pidfile ${PID_FILE} --name ${NAME} --signal 9
    ret=$?

    if [ "$ret" -eq 0 ]; then
        echo "'${NAME}' stopped"
    elif [ "$ret" -eq 1 ]; then
        echo "'${NAME}' is already stopped"
    elif [ "$ret" -eq 2 ]; then
        echo "'${NAME}' not stopped after ${RETRY} tries"
    else
        echo "An error occured stopping '${NAME}'"
    fi
    return ${ret}
}

status_daemon()
{
    start-stop-daemon --status --pidfile ${PID_FILE} --name ${NAME}
    ret=$?

    if [ "$ret" -eq 0 ]; then
        echo "'${NAME}' is running"
    elif [ "$ret" -eq 1 ]; then
        echo "'${NAME}' stopped but pid file exits"
    elif [ "$ret" -eq 3 ]; then
        echo "'${NAME}' stopped"
    elif [ "$ret" -eq 4 ]; then
        echo "Unable to get '${NAME}' status"
    else
        echo "Unknown status : ${ret}"
    fi
    return ${ret}
}

case "$1" in
  start)
    echo "Starting '${NAME}' deamon :"
    start_daemon
    ;;
  stop)
    echo "Stopping '${NAME}' deamon :"
    stop_daemon
    ;;
  status)
    echo "Getting '${NAME}' deamon status :"
    status_daemon
    ;;
  restart|reload)
    "$0" stop
    "$0" start
    ;;
  *)
    echo "Usage: $0 {start|stop|status|restart}"
    exit 1
esac

exit $?

Using this script from command line to control the daemon execution works well.

So the aim now is to use this script from another c program to launch the daemon and to control its execution from this program.

I have implemented a simple C program which:

Launch the script with 'start' argument
Wait for pid file creation
Read daemon's pid from pid file
Periodically check that daemon is alive by checking existence of file /proc/<daemon_pid>/exec
If daemon is killed, relaunch it

And here is the issue I'm facing. The program works well only if I don't call pclose.

Here is the code of the program

#define DAEMON_NAME       "daemon-test"
#define DAEMON_START_CMD  "/etc/init.d/" DAEMON_NAME " start"
#define DAEMON_STOP_CMD   "/etc/init.d/" DAEMON_NAME " stop"
#define DAEMON_PID_FILE   "/var/run/" DAEMON_NAME ".pid"

int main()
{
    char daemon_proc_path[256];
    FILE* daemon_pipe = NULL;
    int daemon_pid = 0;
    FILE* fp = NULL;
    int ret = 0;
    int i = 0;

    printf("Launching '%s' program\n", DAEMON_NAME);
    if(NULL == (daemon_pipe = popen(DAEMON_START_CMD, "r")))
    {
        printf("An error occured launching '%s': %m\n", DAEMON_START_CMD);
        return EXIT_FAILURE;
    }
    #ifdef USE_PCLOSE
    else if(-1 == (ret = pclose(daemon_pipe)))
    {
        printf("An error occured waiting for '%s': %m\n", DAEMON_START_CMD);
        return EXIT_FAILURE;
    }
    #endif
    else
    {
        printf("Script exit status : %d\n", ret);

        while(0 != access(DAEMON_PID_FILE, F_OK))
        {
            printf("Waiting for pid file creation\n");
            sleep(1);
        }
        if(NULL == (fp = fopen(DAEMON_PID_FILE, "r")))
        {
            printf("Unable to open '%s'\n", DAEMON_PID_FILE);
            return EXIT_FAILURE;
        }
        fscanf(fp, "%d", &daemon_pid);
        fclose(fp);
        printf("Daemon has pid=%d\n", daemon_pid);
        sprintf(daemon_proc_path, "/proc/%d/exe", daemon_pid);
    }

    while(1)
    {
        if(0 != access(daemon_proc_path, F_OK))
        {
            printf("\n--- Daemon (pid=%d) has been killed ---\n", daemon_pid);
            printf("Relaunching new daemon instance...\n");
            if(NULL == (daemon_pipe = popen(DAEMON_START_CMD, "r")))
            {
                printf("An error occured launching '%s': %m\n", DAEMON_START_CMD);
                return EXIT_FAILURE;
            }
            #ifdef USE_PCLOSE
            else if(-1 == (ret = pclose(daemon_pipe)))
            {
                printf("An error occured waiting for '%s': %m\n", DAEMON_START_CMD);
                return EXIT_FAILURE;
            }
            #endif
            else
            {
                printf("Script exit status : %d\n", ret);

                while(0 != access(DAEMON_PID_FILE, F_OK))
                {
                    printf("Waiting for pid file creation\n");
                    sleep(1);
                }
                if(NULL == (fp = fopen(DAEMON_PID_FILE, "r")))
                {
                    printf("Unable to open '%s'\n", DAEMON_PID_FILE);
                    return EXIT_FAILURE;
                }
                fscanf(fp, "%d", &daemon_pid);
                fclose(fp);
                printf("Daemon has pid=%d\n", daemon_pid);
                sprintf(daemon_proc_path, "/proc/%d/exe", daemon_pid);
            }
        }
        else
        {
            printf("Daemon alive (pid=%d)\n", daemon_pid);
        }
        sleep(1);
    }

    return EXIT_SUCCESS;
}

From what I understood pclose is supposed to wait for child process termination and only when the child process has returned, it closes the pipe.

So I don't understand why my implementation with pclose doesn't work when it works without calling it.

Here are the logs with and without the pclose block commented

Without pclose calling:

# ./popenTest 
Launching 'daemon-test' program
Script exit status : 0
Waiting for pid file creation
Daemon has pid=435
Daemon alive (pid=435)
Daemon alive (pid=435)
Daemon alive (pid=435)
Daemon alive (pid=435)

With pclose calling:

# ./popenTest 
Launching 'daemon-test' program
Script exit status : 36096
Waiting for pid file creation
Waiting for pid file creation
Waiting for pid file creation
Waiting for pid file creation

As you can see, the daemon is never launched and the pid file is never created neither.

Even if my program works without pclose I would like to understand the underlying issue with the call to pclose.

Why using pclose makes the program fail when the behaviour is good without calling it ?

EDIT:

Here are some more information for the error case

errno is Success
WIFEXITED macro returns true
WEXITSTATUS macro returns 141

By going further into debugging, I have remarqued that modifying the init script to log output to a file makes it work... why ?

I don't get why you use `popen()` at all, if you don't actually read from the pipe, and why you call a script instead of launching your "daemon" directly (which would be very simple if you make it a real daemon detaching itself and starting a new session) -- but anyways, try checking `errno` and use the [`WIFEXITED`, `WEXITSTATUS` etc. macros](https://linux.die.net/man/2/waitpid) to get more information about how the child exited. — , Jul 03 '18 at 15:59
I'm doing so because I shall not modify the daemon's code. I can't get it's pid without using `start-stop-daemon` except by reimplementing `fork` + `exec` to have its pid returned. Moreover it allows me to manage the daemon from command line if the controlling app is stopped (which may happen, in that case the daemon shall still run). And it's more complient with what is already present on the BSP — Arkaik, Jul 03 '18 at 16:21
You don't need to "reimplement" fork/exec, just **use** them. What you do instead looks like a Rube Goldberg machine. A **real** daemon would detach itself and write the pid file, btw. — , Jul 03 '18 at 16:43
Are the commented out blocks of code ones which you make active when you use `pclose()`? Why don't you have the code able to choose at runtime whether to execute the `pclose()`? Then we wouldn't have to guess — you'd say "when I run the command with no arguments, `pclose()` is not executed, and the output is this; when I run it with any arguments, `pclose()` _is_ executed, and the output is this". And you'd be able to test both variants without having to recompile. — Jonathan Leffler, Jul 03 '18 at 16:54
And alright a "real" daemon would detach itself, but as I said I shall not modify the daemon's code (which is not the one in my example). Moreover, it's more simple (at least from my point of view) to centralize daemons management using a service management system (i.e. systemV) — Arkaik, Jul 03 '18 at 16:55
@JonathanLeffler I could have done that you're right, however I though that specifying "Here are the logs with and without the pclose block commented" would be sufficient for my issue to be understood. — Arkaik, Jul 03 '18 at 17:01
Comments in code on SO are usually ignored — they're mainly irrelevant when trying to understand the operation of code, and when dealing with people's problems, the first thing I do is strip out the comments. Now I've got to hack at your code so it can be run both ways. It would have been better to use `#ifdef CALL_PCLOSE` / `#endif` around the code, so it can be compiled both ways without needing to be edited. It all comes under the heading of making it easy for people to help you. — Jonathan Leffler, Jul 03 '18 at 17:03

Jonathan Leffler · Accepted Answer · 2018-07-03T17:22:05.787

You use popen(DAEMON_START_CMD, "r"). That means your 'daemon watcher' is reading the standard output of your 'daemon starter' script. If you pclose() that pipe, the script writes to standard output and gets a SIGPIPE because the read end of the pipe is closed. Whether that happens before the actual daemon is started or not is open to debate — and timing issues.

Don't pclose() the pipe until you know the daemon starter has exited, by some means or other. Personally, I'd use pipe(), fork() and execv() (or some other variant of the exec family of functions directly. I don't think popen() is the right tool for the job. But if you're going to use popen(), then read the input until you get no more (EOF), then use pclose() safely. You don't have to print what you read, though it would be conventional and sensible to do so — the 'daemon starter' script is telling you useful information.

The classic way to check whether a process ID is still running is to use kill(daemon_pid, 0). If the process executing that is appropriately privileged (same UID as the process, or root privileges), this works. It won't help if you can't send an active signal to the PID.

(I assume start-stop-daemon is a program, probably a C program rather than a shell script, that launches another program as a daemon. I have a similar program that I call daemonize — and it too is intended for convert programs not specifically designed as daemons into a program running as a daemon. Many programs don't work well as daemons — consider what daemonizing ls, grep, ps, or sort would mean. Other programs can more sensibly be run as daemons.)

Thanks for those informations, I have been able to find more documentation about `popen` and `SIGPIPE`. So I think you're right and `popen` doesn't seem to be the good solution, I'll just make a try with `fork` and `exec` — Arkaik, Jul 04 '18 at 13:21
I have finally reimplemented it using `fork`, `execl` and `waitpid` and it works like a charm. The problem was indeed coming from the pipe beeing unread. Thx — Arkaik, Jul 05 '18 at 08:53

Pclose seems to make process fail

1 Answers1