1

I was hesitant to post this question because I assumed someone somewhere had asked it already but after much scouring, I've come up empty, so here it is.

BACKGROUND: I'm running a local agent (written in C, listening via TCP) which allows for execution of a small number of scripts/commands remotely. (Via a web interface, to be specific.) The scripts themselves are a mixture of binaries, bash, or perl scripts and the agent itself doesn't really care, as long as they are allowed in the list.

(This is on a corporate, internal network and this is in the very early stages, so please don't debate the merits of security at this time.)

The C agent code to launch processes is this:

sprintf(mrun, "%s %s 2>&1", file, args);
mexec = popen(mrun, "r");
[read some returned buffer]
pclose(mexec);

This approach works well for both external bash and perl scripts, provided the scripts just execute commands (or do things in the foreground). However, I recently had a need to expand a script to include a restart of a daemon, in this case, named.

The script itself (bash) is simple:

#!/bin/bash
pkill -9 named
/local/mnt/named/sbin/named -c /local/mnt/named/var/named.conf &
echo "restarted"

The problem I am running into is that the script never finishes (i.e. restarted is never echo'd) when run via the C agent, so the control is never returned and the TCP socket never gets free'd up. As far as the agent is concerned, the process is still running. If I run the script from a terminal, it works fine and control is returned back to me.

Am I missing something that would allow the script to execute normally when being forked off from a C daemon versus just being called from the bash terminal?

I know of nohup and I guess could use that if all else fails but I was curious if there is some other kind of workaround for doing this.

Randy
  • 11
  • 4
  • 1
    Does your "C agent" reads all of the output? IOW, is it waiting for its input to be closed by the other side? If so, it may be expected that it never finishes: the named, started in background, keeps the handle to stdout and is running. So even if the restarting script has finished, there is still output to wait for (from named). Try providing the named with its own handles for everything: `.../named ... /dev/null &` and see if it helps – fork0 Jul 26 '12 at 20:42
  • It only reads the buffer once, so it's not waiting for input. The changes you suggested helped, somewhat. The script itself seems to finish and I get the echo now. But that introduced a different problem: it doesn't appear that the agent can close out the socket connection once the script is done. It just hangs, until I kill named. So, now it looks like I have a whole other issue with the spawned process keeping the original TCP connection open, even though it shouldn't and they don't use the same port. – Randy Jul 26 '12 at 22:03
  • @Randy: Try doing `shutdown` passing in `SHUT_RDWR` for the how on the socket before calling `close`. The socket `fd` was probably duplicated to the child processes. – jxh Jul 26 '12 at 22:51
  • Unfortunately, that didn't fix it. The connection still stays open. Probably close to what is happening, though. – Randy Jul 26 '12 at 23:13
  • You might have more than stdio handle leaked. And named can even pick them up as if they were his. Can you modify the C agent to close all handles and redirect stdio handles to `/dev/null` (or you logging/tracing handle) before it execs the script? And reset signals to defaults (in case something sets SIGALRM and the like)? – fork0 Jul 27 '12 at 07:03
  • I can modify the C agent to do whatever I want. Unfortunately, I'm not 100% sure how to accomplish what you're talking about. I don't know that stdio is an issue anymore, after your original fix. I'm thinking that the socket is duplicated once the process forks, as user315052 suggested. But I guess I'd happily take suggestions on how to solve either (or both) issues. – Randy Jul 27 '12 at 08:26

1 Answers1

0

Based on feedback from the comments above, I was able to get the script to continue working after launching the daemon process, thanks to some additional redirects:

/local/mnt/named/sbin/named -c /local/mnt/named/var/named.conf </dev/null &> /dev/null &

So, thanks to fork0 for that bit of knowledge.

Afterward, I noticed that the TCP socket connection wouldn't close properly, even though the script was done working. After some more info below and doing a lot of research, it turns out that child processes will inherit (and keep open) file descriptors from the parent process (which includes sockets).

I looked all over for methods to disown the child process but didn't really find any that would work for me (or didn't constitute an entire rewrite of the agent).

Finally, I stumbled upon this question, which is related but not in a programming language I use:

os.execute without inheriting parent's fds

This basically involved the child process closing any open file descriptors inside the code, thus freeing them to be closed by the parent. (I think?)

I added a few lines to the bash script to do this prior to starting named and it does work.

for i in `nawk 'BEGIN{ for(i=1;i<=255;i++) print i}'`
do
eval exec `echo $i | sed -e 's/.*/&<\&-/'`
done

(I would up using nawk instead of seq because I need it to run on Solaris and Linux.)

Some basic testing shows that this has solved the major issue of the socket not being able to close but I'll need to do some more research on whether this will have any other ramifications that I am not aware of. There may also be a better, safer way to achieve this but at least I'm on the right track.

Community
  • 1
  • 1
Randy
  • 11
  • 4