Why does /bin/sh hang when ran via execl on this particular script?

Question

Can you give me some ideas on the following code? The code runs but it doesn't exit. The other string, s="ls -1" works well. Running the shell snippet via sh(1) works just fine also.

#include <unistd.h>
#include <string.h>

int
main(int argc, char *argv[])
{
    int fd[2];

    char *s = "ls -1 \"/usr/bin\" | while IFS= read -r fp\ndo\ncat <<- EOF\n\t$fp\nEOF\ndone;";
    //char *s = "ls -1";


    pipe(fd);

    switch(fork()) {
        case 0:
            close(fd[1]);
            dup2(fd[0], 0);
            execl("/bin/sh", "sh", NULL);
            close(fd[0]);
        break;
        default:
            close(fd[0]);
            write(fd[1], s, strlen(s) + 1);
            close(fd[1]);
        break;
    }

    return 0;
}

You aren't closing enough file descriptors in the child. **Rule of thumb**: If you [`dup2()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/dup2.html) one end of a pipe to standard input or standard output, close both of the original file descriptors from `pipe()` as soon as possible. In particular, that means _before_ using any of the [`exec*()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/execvp.html) family of functions. The rule also applies with either `dup()` or [`fcntl()`](http://pubs.opengroup.org/onlinepubs/9699919799/functions/fcntl.html) with `F_DUPFD`. — Jonathan Leffler, Apr 19 '19 at 18:40
In this case, the `close(fd[0]);` needs to be before the `execl()` — remember, if it is successful, `execl()` never returns (it only returns on failure). You should arguably have an error report and `exit(1);` or similar after the `execl()`; at the moment, you report success even on failure. — Jonathan Leffler, Apr 19 '19 at 18:40
@JonathanLeffler, I was wondering about the order of `execl` and `close(fd[0]);` and your comments make sense, however, switching their order still doesn't work. It behaves the same way, doesn't exit. — Adrian Emil Grigore, Apr 19 '19 at 18:57
See my answer — yes, I agree that the comments aren't the solution to your problem, though the close after the exec is not executed and there should be an error message there. — Jonathan Leffler, Apr 19 '19 at 19:00
Might also be related to the big number of heredocs and tmp files. — Adrian Emil Grigore, Apr 19 '19 at 19:16
It's much easier on humans to use semi-colons as a separator rather than newlines: `while IFS= read -r fp; do cat`. If you're going to use unnecessary quotes, use single quotes: `"ls -l '/usr/bin'...` so you don't need to escape them. — William Pursell, Apr 19 '19 at 20:07
are you sure that it hangs, or it just fails to print the cursor after it had run? anyways, **a)** you shouldn't write the terminating NUL byte to the child (change `strlen(s) + 1` to `strlen(s)`) **b)** you should [`wait(2)`](http://man7.org/linux/man-pages/man2/wait.2.html) for your child. — , Apr 19 '19 at 21:12
please also mention (in your Q) your system; I'm not able to reproduce your problem. — , Apr 19 '19 at 21:19
regarding: `switch(fork()) {` The function `fork()` has three kinds of returned values:: 1) <0 means an error occurred 2) ==0 means the child process is running 3) >0 means the parent is running. The code needs to check for all three conditions — user3629249, Apr 19 '19 at 22:57

Jonathan Leffler · Answer 1 · 2019-04-19T19:10:01.580

When I was converting my comments into an answer, I tested the proposed change, and it didn't fix the problem. One fix that ought to work is to add ; exit at the end of the string, even though that's tantamount to cheating. However, testing that also shows it doesn't finish; it is as if the ls isn't terminating.

I went to another terminal to see whether the processes for ls or pipe97 (my name for your code) were still around; they weren't.

Try typing ps to your 'hung' process.

You should get a normal output and your prompt.

Because the parent doesn't wait for the child to exit, the prompt is lost somewhere in the output from ls (which is quite long and produced quite slowly). Redirect the output to /dev/null and you'll see your prompt.

The best fix is probably to add a wait() loop in the parent process, so it doesn't exit until after the child does.

#include <sys/wait.h>

…
int corpse;
int status;
while ((corpse = wait(&status)) > 0)
    ;

While debugging, you can print corpse and status so you can see what's going on.

I observe that this comment (without its preamble) remains valid:

…the close(fd[0]); needs to be before the execl() — remember, if it is successful, execl() never returns (it only returns on failure). You should arguably have an error report and exit(1); or similar after the execl(); at the moment, you report success even on failure.

By itself, the Rule of Thumb comment is valid, but it is not actually applicable to this code — unclosed pipe descriptors aren't causing the trouble, even though one pipe descriptor that should have been closed was not closed.

score 1 · Answer 2 · answered Apr 20 '19 at 23:02

@WilliamPursell Thanks, you're right. I wasn't focusing on that on this example.

@user3629249 I'm aware I'm not doing proper error checking, thank you!

@mosvy It was OpenBSD, indeed it ran on a Mac.

This version works:

#include <string.h>
#include <unistd.h>
#include <sys/wait.h>

int
main(int argc, char *argv[])
{
    int fd[2];

    char *s = "ls -1 '/usr/bin' | while IFS= read -r fp\ndo\ncat <<- EOF\n\t$fp\nEOF\ndone;";
    //char *s = "ls -1";


    pipe(fd);

    switch(fork()) {
        case 0:
            close(fd[1]);
            dup2(fd[0], 0);
            execl("/bin/sh", "sh", NULL);
            close(fd[0]);
        break;
        default:
            close(fd[0]);
            write(fd[1], s, strlen(s));
            close(fd[1]);
            wait(NULL);
        break;
    }

    return 0;
}

It was the wait(2) that did it. I remember I did wait(2) at some point, however I think I screwed the close(1) order so it was blocking.

Anyway, problem fixed, thanks!

Why does /bin/sh hang when ran via execl on this particular script?

2 Answers2