4

When I run the following C code in RHEL 7.4:

errno = 0;

status = system("ls >> /tmp/test.txt");
sprintf(msg, "Value of errno: %d\n", errno);
sprintf(msg, "Status: %d ", status);
os_log_msg(msg);

I get return code of -1 and errno = 10 (No child processes). The /tmp/test.txt file is actually created, so it works, but the program sees a non-zero return code and exits.

The thing is that this command returned a 0 in HP-UX 11.11, but we migrated to RHEL 7.4 and now we get -1.

jww
  • 97,681
  • 90
  • 411
  • 885
JEtheDBA
  • 93
  • 1
  • 6
  • 8
    Why did you put `extern int errno` there? You should use standard `errno` _macro_ from `` –  Mar 13 '19 at 19:55
  • ... but if you're going to rely on `errno` instead of `status` to check for errors then you should indeed set it to 0 immediately prior to calling `system()` (without declaring it yourself). Also, do not rely on it to keep its value across subsequent calls to standard library functions. – John Bollinger Mar 13 '19 at 20:01
  • @JohnBollinger This is clearly just a [mcve], it doesn't "rely" on either variable, it just prints them both. So the question is why is `system()` returning `-1` for a perfectly valid command. – Barmar Mar 13 '19 at 20:17
  • Does `/tmp/test.txt` contain the `ls` output? – Barmar Mar 13 '19 at 20:17
  • Well no, @Barmar, it's *not* an MCVE. It's lacking in the completeness dimension. Moreover, if the OP indeed does declare `errno` in their real code, then there's not much to say about their specific code beyond that the program exhibits undefined behavior. – John Bollinger Mar 13 '19 at 20:21
  • @JohnBollinger Sorry, what I meant was that it's a simplification that shows the main problem. – Barmar Mar 13 '19 at 20:23
  • Specifically, there's no `if (errno == ...)` that's done without checking `status`. – Barmar Mar 13 '19 at 20:24
  • 1
    Yes, I understand that it's a cut down summary, but there is not enough here to answer the question. If I were to speculate based on what I see, I would guess that `/tmp/test.txt` already exists and is not writable by the OP. But who knows what actually relevant code may have been omitted that would have led me to a different guess / conclusion? – John Bollinger Mar 13 '19 at 20:28
  • 1
    In any case, there isn't anything special or peculiar about RHEL that would make the example call to the `system()` function inherently incorrect, so that I would expect it to fail. A similar, but complete and correct program runs for me on closely-related CentOS 7, returning 0 from `system()` and setting `errno` to 0. – John Bollinger Mar 13 '19 at 20:37
  • You can't invoke `system()` with a stream redirect. Stream redirection with `>>` is a shell feature. You need to use `strdup2` to redirect the output of a child process. – Jazzwave06 Mar 13 '19 at 20:52
  • 3
    Is that actually the full program? You can certainly trigger this error if, for example, you add `SA_NOCLDWAIT` to the flags for the `SIGCHILD` handler. – rici Mar 13 '19 at 21:00
  • 5
    @sturcotte06: system uses a shell to execute the argument. Posix standard quote: "The system() function shall behave as if a child process were created using fork(), and the child process invoked the sh utility using execl() as follows: `execl(, "sh", "-c", command, (char *)0);`" – rici Mar 13 '19 at 21:02
  • Are you running the program from the command line or is it started by systemd? – stark Mar 13 '19 at 21:20
  • Thanks for the replies! This is not the "real" command in system(). This is a test command to show that even something simple is working, but returning -1. The "real" command calls SQL*Loader. – JEtheDBA Mar 14 '19 at 14:21
  • @stark: The program is typically run from a cron job calling the *.exe file. – JEtheDBA Mar 14 '19 at 14:27
  • @John Bollinger: What else would you like to know? – JEtheDBA Mar 14 '19 at 14:29
  • Also, the "real" program does not have errno at all. I only added that to troubleshoot. So, whether it's "extern int errno = 0;", "errno = 0;", or nothing, system() returns a -1. – JEtheDBA Mar 14 '19 at 14:40
  • 2
    @JEtheDBA, I would like a *bona fide* [mcve], as Barmar already requested. If what you've presented so far is in fact characteristic of the problem, then a MCVE needn't be more than a few lines longer. I would also like to know about the existence, ownership, and permissions of the file to which you're trying to redirect the command's output relative to the user running the program. Including SELinux properties if that's enabled in enforcing mode (the default for CentOS 7). – John Bollinger Mar 14 '19 at 14:41
  • 1
    @jethedba: it's not that people don't believe you; it's that we need something testable. A complete but short program with a `main()` and and `#include`s and, if absolutely necessary support functions. (How is `os_log_msg` implemented? Does it rely on initialisation code of a complex library? Better to just write to stderr or a temp file.) Then show how you run the program and its precise result. *You* can see all this stuff; it's right in front of you. We can't. So share. – rici Mar 14 '19 at 16:27
  • 1
    Anyway, I think @zwol nailed it. The question is figuring out what might be installing or modifying a signal handler. – rici Mar 14 '19 at 16:30

1 Answers1

8

The value −1 can only be returned by system if initial creation of a child process (via fork) or collection of its exit status (via wait) fails. Neither of these things can happen because of a problem with the command passed to system, because the command is interpreted in the child process. Problems with the command will show up as system returning a value s that is not equal to either 0 or −1, and for which either WIFEXITED(s) && WEXITSTATUS(s) != 0 or WIFSIGNALED(s) is true. (The macros WIFEXITED, WIFSIGNALED, and WEXITSTATUS are defined in sys/wait.h.) (See the POSIX specification for system to understand why this happens.)

fork failures typically only happen because of system-wide resource exhaustion and/or severe imposed resource quotas. For instance, this program prints

true: status=-1 errno=11 (Resource temporarily unavailable)

when I run it.

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/resource.h>

int main(void)
{
  struct rlimit rl;
  rl.rlim_cur = 1;
  rl.rlim_max = 1;
  setrlimit(RLIMIT_NPROC, &rl);

  int status = system("true");
  printf("true: status=%d errno=%d (%s)\n", status, errno, strerror(errno));
  return 0;
}

A wait failure inside system could happen if you have a SIGCHLD handler that steals the wait status. For instance, this program prints

true: status=-1 errno=10 (No child processes)

when I run it. (There are several other ways in which a SIGCHLD handler could interfere with system; this is just the shortest demo program I could think of.)

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <signal.h>

int main(void)
{
  signal(SIGCHLD, SIG_IGN);

  int status = system("true");
  printf("true: status=%d errno=%d (%s)\n", status, errno, strerror(errno));
  return 0;
}

You said that whatever command you pass to system does execute correctly but system nonetheless returns −1, and that makes me think your problem is due to a bad interaction between wait and a SIGCHLD handler. Getting "No child processes" (ECHILD) in errno is consistent with this hypothesis, because wait is documented to produce that error code, and fork isn't. But it's just a hypothesis. To diagnose your problem any better than this, we need to see a complete test program that we can compile and run for ourselves and observe the exact same failure condition that you are. Please read and follow the instructions at https://stackoverflow.com/help/mcve .

zwol
  • 135,547
  • 38
  • 252
  • 361
  • Thanks for the answer. The actual program does not contain errno at all. I added that to troubleshoot. The main problem is that system(cmd) runs whatever is in "cmd" successfully, but ALWAYS returns a -1. Even system() by itself returns a -1. – JEtheDBA Mar 14 '19 at 14:43
  • @JEtheDBA I can think of a reason why this might be happening, but I really need to see a _complete test program_ that reproduces the phenomenon (`system(cmd)` always returns -1) before I can tell you any more. – zwol Mar 14 '19 at 15:27
  • 1
    @JEtheDBA Actually, one more thing: When constructing the complete test program, pay attention to whether or not your full program installs any signal handlers, and if so, exactly how it does that. – zwol Mar 14 '19 at 15:28