10

I'm currently reading up on and experimenting with the different possibilities of running programs from within C code on Linux. My use cases cover all possible scenarios, from simply running and forgetting about a process, reading from or writing to the process, to reading from and writing to it.

For the first two, popen() is very easy to use and works well. I understand that it uses some version of fork() and exec() internally, then invokes a shell to actually run the command.

For the third scenario, popen() is not an option, as it is unidirectional. Available options are:

  • Manually fork() and exec(), plus pipe() and dup2() for input/output
  • posix_spawn(), which internally uses the above as need be

What I noticed is that these can achieve the same that popen() does, but we can completely avoid the invoking of an additional sh. This sounds desirable, as it seems less complex.

However, I noticed that even examples on posix_spawn() that I found on the Internet do invoke a shell, so it would seem there must be a benefit to it. If it is about parsing command line arguments, wordexp() seems to do an equally good job.

What is the reason behind benefit of invoking a shell to run the desired process instead of running it directly?


Edit: I realized that my wording of the question didn't precisely reflect my actual interest - I was more curious about the benefits of going through sh rather than the (historical) reason, though both are obviously connected, so answers for both variations are equally relevant.

domsson
  • 4,553
  • 2
  • 22
  • 40
  • 5
    How would you handle `popen("cmd1; cmd2 | grep foo")` without invoking shell? You'd have to basically implement shell inside standard library. – el.pescado - нет войне Feb 20 '18 at 12:09
  • @el.pescado good point, also mentioned in [one of the answers](https://stackoverflow.com/a/48884997/3316645). In practice, I would advice my users (the commands ultimately come from the user) to wrap such things in an executable bash script, then run _that_ through my program. That should work, I suppose. But definitely an important difference between using the shell and not using it. – domsson Feb 20 '18 at 12:12
  • I've often thought it would have been better for `system()`, `popen()` etc. to accept `execl()`-style argument lists, and only start a shell if you ask for that explicitly (e.g. `popen(get_shell(), "-c", "$commands")`). But it is the way it is, for good or bad. OTOH, you're in luck, because you can write your own function to do what you want, and you can read the implementations of `popen()` to crib from them. – Toby Speight Feb 21 '18 at 09:00
  • @TobySpeight agreed. I ended up using `popen()` for where I only need unidirectional communication, `posix_spawn()` to start and forget about a process and wrote my own function for bidirectional communication. All seems to work well so far and it helped me to get a better understanding of the functions involved. – domsson Feb 21 '18 at 09:09

3 Answers3

7

Invoking a shell allows you to do all the things that you can do in a shell. For example,

FILE *fp = popen("ls *", "r");

is possible with popen() (expands all files in the current directory). Compare it with:

execvp("/bin/ls", (char *[]){"/bin/ls", "*", NULL});

You can't exec ls with * as argument because exec(2) will interpret * literally.

Similarly, pipes (|), redirection (>, <, ...), etc., are possible with popen.

Otherwise, there's no reason to use popen if you don't need shell - it's unnecessary. You'll end up with an extra shell process and all the things that can go wrong in a shell go can wrong in your program (e.g., the command you pass could be incorrectly interpreted by the shell and a common security issue). popen() is designed that way. fork + exec solution is cleaner without the issues associated with a shell.

P.P
  • 117,907
  • 20
  • 175
  • 238
  • Regarding expansion, that did occur to me, but that's why I mentioned `wordexp()` - it would expand the `*` similar to how the shell would, right? The pipes, however, are indeed a good point. – domsson Feb 20 '18 at 12:10
  • Then you'll endup re-implementing a shell ;-) – P.P Feb 20 '18 at 12:12
  • True. However, it still saves on the additional `sh` process, which I find quite desirable. But that obviously depends on the use case and also on taste. – domsson Feb 20 '18 at 12:14
  • 3
    See `popen` as a convenience. It's designed (and *required* by POSIX) to invoke a shell. So, it'll and the standard `popen` function isn't going to change its behaviour. If your requirement is different, you can custom implement a custom one. But if you don't need shell functionality, then it's always better to use fork+exec. – P.P Feb 20 '18 at 12:20
4

The glib answer is because the The POSIX standard ( http://pubs.opengroup.org/onlinepubs/9699919799/functions/popen.html ) says so. Or rather, it says that it should behave as if the command argument is passed to /bin/sh for interpretation.

So I suppose a conforming implementation could, in principle, also have some internal library function that would interpret shell commands without having to fork and exec a separate shell process. I'm not actually aware of any such implementation, and I suspect getting all the corner cases correct would be pretty tricky.

janneb
  • 36,249
  • 2
  • 81
  • 97
  • Interesting. In that case, I wonder what's the reason behind the standard? Trying some simple programs with both, `popen()` and `posix_spawnp()`, seems to yield identical results so far, just that I save one additional `sh` process with the latter. Even bash scripts are being executed nicely, as is indicated by the [manpage of `execve`](https://linux.die.net/man/2/execve). What kind of corner cases might I run into when doing it this way? Should one always go through `sh`? – domsson Feb 20 '18 at 12:07
  • Well, the shell interpreter can do all kinds of stuff such as globbing (*). If you don't need that, by instead using fork+exec (or posix_spawn) you can indeed save one extra fork+exec. – janneb Feb 20 '18 at 12:09
  • 1
    @domsson : the reason behind this standard is existing practices. This mechanism was often used for email and lpd filters, allowing the admins to add or edit the filters *without compilation* – joop Feb 20 '18 at 13:10
3

The 2004 version of the POSIX system() documentation has a rationale that is likely applicable to popen() as well. Note the stated restrictions on system(), especially the one stating "that the process ID is different":

RATIONALE

...

There are three levels of specification for the system() function. The ISO C standard gives the most basic. It requires that the function exists, and defines a way for an application to query whether a command language interpreter exists. It says nothing about the command language or the environment in which the command is interpreted.

IEEE Std 1003.1-2001 places additional restrictions on system(). It requires that if there is a command language interpreter, the environment must be as specified by fork() and exec. This ensures, for example, that close-on- exec works, that file locks are not inherited, and that the process ID is different. It also specifies the return value from system() when the command line can be run, thus giving the application some information about the command's completion status.

Finally, IEEE Std 1003.1-2001 requires the command to be interpreted as in the shell command language defined in the Shell and Utilities volume of IEEE Std 1003.1-2001.

Note the multiple references to the "ISO C Standard". The latest version of the C standard requires that the command string be processed by the system's "command processor":

7.22.4.8 The system function

Synopsis

#include <stdlib.h>
int system(const char *string);

Description

If string is a null pointer, the system function determines whether the host environment has a command processor. If string is not a null pointer, the system function passes the string pointed to by string to that command processor to be executed in a manner which the implementation shall document; this might then cause the program calling system to behave in a non-conforming manner or to terminate.

Returns

If the argument is a null pointer, the system function returns nonzero only if a command processor is available. If the argument is not a null pointer, and the system function does return, it returns an implementation-defined value.

Since the C standard requires that the systems "command processor" be used for the system() call, I suspect that:

  1. Somewhere there's a requirement in POSIX that ties popen() to the system() implementation.
  2. It's much easier to just reuse the "command processor" entirely since there's also a requirement to run as a separate process.

So this is the glib answer twice-removed.

Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • Thank you for this very thorough answer. I realize now that I should have asked _"What's the benfit [...]"_ instead of _"What's the reason [...]"_, but this is equally interesting and relevant. – domsson Feb 20 '18 at 12:41
  • 3
    @domsson I suspect the benefit is "that's how it was done prior to standardization, and we didn't want to break things". – Andrew Henle Feb 20 '18 at 12:42