2

I have a C++ program that runs as a linux service. Some of the program's command line options simply set values in its configuration files and then exit, which then require the service to restart to pick up the new config. In order to allow the service to continue running uninterrupted, it works as follows:

  • background service starts on system boot
    • background service creates a 'config watchdog' thread monitoring config files
  • user runs 'progname options' from command line
    • config files modified
    • command line instance of program exits
    • background service config watchdog thread detects changes to config, triggers restart

When the program restarts after reading the new config, I am calling execv so that it will remain in the same process space as the original instance, so that it can continue to be managed as a service. The problem is that execv is not behaving as expected, and it is instead terminating the existing process and restarting in a new one. Because the PID no longer matches, if I attempt to run 'service progname stop/restart' after this point, it will not work properly, 'stop' will leave the service running, and 'restart' will spawn a duplicate instance of the program.

I have confirmed that argv[0] being passed to execv is the full path to the executable, so it should not be searching for the executable in the PATH via the shell (which should also be prevented by the fact that I'm using execv instead of execvp) which I have read about causing similar problems in other applications.

rdowell
  • 729
  • 4
  • 15
  • It is true. All exec-family functions *replace* current process. Also, consider using traditional SIGHUP over thread-watcher. – KAction Dec 18 '12 at 18:02
  • Yes, exec-family functions replace the current process, but since a new process is not created, the PID should not be changed. What I'm finding via attaching to the service instance with gdb, is that when execv() is called, gdb initially correctly follows the exec call and prints 'executing new program /path/to/program', reloads all the debugging symbols, but then detaches and prints 'Detaching after fork from child process XXX' with the PID of the 'new' instance. There are no calls to fork() anywhere within my program so it appears the execv call is causing a fork for some reason – rdowell Dec 18 '12 at 19:02
  • 1
    So far, the principle of elimination would lead us to believe that your program does fork, and you should be hunting for the fork! How about some more gdb-foo (eg "catch fork", or set "break fork" after the exec). There might be some other design issues here! Watchdog thread's fair enough; exec'ing the service is diabolical(!!); allowing duplicate instances is a problem in itself (create a lockfile with flock or lockf). – Nicholas Wilson Dec 18 '12 at 19:56

1 Answers1

0

Found the issue, the problem was that the program uses daemon() when it starts, which does a fork/exec internally, and when restarting the program it was calling daemon() again. After enhancing it to distinguish between start/restart and avoid calling daemon() again, the problem is fixed.

rdowell
  • 729
  • 4
  • 15