1

I am processing many files using many separate scripts. To speed up the processing I placed them in the background using &, however with this I lost the ability to keep track of what they are doing (I can't see the output).

Is there a simple way of getting output based on PID? I found some answers which are based on fg [job number], but I can't figure out job number from PID.

econ
  • 547
  • 7
  • 22

4 Answers4

2

A script that is backgrounded will normally just continue to write to standard output; if you run several, they will all be dumping their output intermingled with each other. Dump them to a file instead. For example, generate an output file name using $$ (current process ID) and write to that file.

outfile=process.$$.out
# ...
echo Output >$outfile

will write to, say, process.27422.out.

Amadan
  • 191,408
  • 23
  • 240
  • 301
  • Hmm. Interesting, I don't see the intermingled output (even though this is what I would also be expecting). However, I can see from one of the updated logs that the scripts are running.... Also, I'm quite upset that I can't see the processed files (but if I ctrl+C, then the processed files appear...http://stackoverflow.com/questions/36731980/writing-files-in-the-background-process?noredirect=1#comment61048244_36731980). Does this indicate that I have some problems in the script? – econ Apr 20 '16 at 04:20
  • Note that unless you take steps to ensure it is a different process executing this each time (sub-shell, for example), the processes will all write to the same file, which is not quite what's wanted. – Jonathan Leffler Apr 20 '16 at 04:57
  • @JonathanLeffler: Two backgrounded processes will surely be different processes, no? It is not certain they will have different PIDs if they don't run concurrently, but it is not very likely. – Amadan Apr 20 '16 at 05:56
  • It is hard to say what exactly is happening, given that you didn't post any code. It is possible you are having buffering problems? – Amadan Apr 20 '16 at 05:57
  • It depends on where the backgrounding is done. If you have `echo Output > process.$$.out &`, the `$$` is the PID of the parent shell. If you have `(echo Output > process.$$.out) &`, the value of `$$` is still that of the parent process. With `sh -c "echo Output > process.$$.out &"`, the value of `$$` is the parent process's; with `sh -c 'echo Output > process.$$.out &'` then `$$` is evaluated by the child, not the parent. You have to be rather careful, as I said. – Jonathan Leffler Apr 20 '16 at 06:01
  • Oh, yeah. I guess I could have been clearer - `echo Output > process.$$.out` was meant to be one of the commands in the script being backgrounded. A safer way would be to do `outfile=process.$$.out` and then `echo > $outfile` to guard against subshells inside the script. Let me clarify in the post. – Amadan Apr 20 '16 at 06:02
  • @Amadan: Thank you, this is helpful! If I understood this approach right, this is basically doing manually what `screen` is doing. – econ Apr 21 '16 at 04:32
2

You might consider running your scripts from screen then return to them whenever you want:

$ screen ./script.sh

To "detach" and keep the script running press ControlA followed by ControlD

$ screen -ls

Will list your screen sessions

$ screen -r <screen pid number>

Returns to a screen session

The few commands above barely touches on the abilities that screen has, so check out the man pages about it and you might be surprised by all it can do.

l'L'l
  • 44,951
  • 10
  • 95
  • 146
  • Yes, I used screens before and for some reason forgot about this function. Thank you! :) – econ Apr 21 '16 at 04:29
1

The answers by other users are right - exec &>$outfile or exec &>$outfifo or exec &>$another_tty is what you need to do & is the correct way.

However, if you have already started the scripts, then there is a workaround that you can use. I had written this script to redirect the stdout/stderr of any running process to another file/terminal.

$ cat redirect_terminal
#!/bin/bash
PID=$1
stdout=$2
stderr=${3:-$2}

if [ -e "/proc/$PID" ]; then
    gdb -q -n -p $PID <<EOF >/dev/null
        p dup2(open("$stdout",1),1)
        p dup2(open("$stderr",1),2)
        detach
        quit
EOF
else
    echo No such PID : $PID
fi

Sample usage:

./redirect_terminal 1234 /dev/pts/16

Where,
1234 is the PID of the script process.
/dev/pts/16 is another terminal opened separately.

Note that this updated stdout/stderr will not be inherited to the already running children of that process.

anishsane
  • 20,270
  • 5
  • 40
  • 73
  • Use this script solution only for the case when you have started the script first & then you changed your mind to redirect stdout/stderr. If not, use the `exec >` mechanism I mentioned at the start of my answer. – anishsane Apr 21 '16 at 04:07
  • Actually, I will be using `screen`-based solution for future scripts, but for the one that is already running (time-costly to restart) your solution is more appropriate. – econ Apr 21 '16 at 13:11
1

Consider using GNU Parallel - it is easily installed on OSX with homebrew. Not only will it tag your output lines, but it will also keep your CPUs busy, scheduling another job immediately the previous one finishes. You can make up your own tags with substitution parameters.

Let's say you have 20 files called file{10..20}.txt to process:

parallel --tagstring "MyTag-{}" 'echo Start; echo Processing file {}; echo Done' ::: file*txt

MyTag-file15.txt    Start
MyTag-file15.txt    Processing file file15.txt
MyTag-file15.txt    Done
MyTag-file16.txt    Start
MyTag-file16.txt    Processing file file16.txt
MyTag-file16.txt    Done
MyTag-file17.txt    Start
MyTag-file17.txt    Processing file file17.txt
MyTag-file17.txt    Done
MyTag-file18.txt    Start
MyTag-file18.txt    Processing file file18.txt
MyTag-file18.txt    Done
MyTag-file14.txt    Start
MyTag-file14.txt    Processing file file14.txt
MyTag-file14.txt    Done
MyTag-file13.txt    Start
MyTag-file13.txt    Processing file file13.txt
MyTag-file13.txt    Done
MyTag-file12.txt    Start
MyTag-file12.txt    Processing file file12.txt
MyTag-file12.txt    Done
MyTag-file19.txt    Start
MyTag-file19.txt    Processing file file19.txt
MyTag-file19.txt    Done
MyTag-file20.txt    Start
MyTag-file20.txt    Processing file file20.txt
MyTag-file20.txt    Done
MyTag-file11.txt    Start
MyTag-file11.txt    Processing file file11.txt
MyTag-file11.txt    Done
MyTag-file10.txt    Start
MyTag-file10.txt    Processing file file10.txt
MyTag-file10.txt    Done
  • If you want the output in order, use parallel -k to keep the output order

  • If you want a progress report, use parallel --progress

  • If you want a log of when jobs started/ended, use parallel --joblog log.txt

  • If you want to run 32 jobs in parallel, instead of the default 1 job per CPU core, use parallel -j 32

Example joblog:

Seq     Host    Starttime       JobRuntime      Send    Receive Exitval Signal  Command
6       :       1461141901.514       0.005      0       38      0       0       echo Start; echo Processing file file15.txt; echo Done
7       :       1461141901.517       0.006      0       38      0       0       echo Start; echo Processing file file16.txt; echo Done
8       :       1461141901.519       0.006      0       38      0       0       echo Start; echo Processing file file17.txt; echo Done
Mark Setchell
  • 191,897
  • 31
  • 273
  • 432
  • Thank you, I already use parallel inside my scripts. I think this approach makes a lot of sense, so I might need to re-write my script a bit. – econ Apr 20 '16 at 18:45