How to track bash script - exit status and why it failed

Question

Sometimes I need to run command like this:

cat file.txt | awk ' NR%4 == 2 { print $1 }' | sort | uniq -c | sort -gr >>output.txt &

over large files (2-32 GB of size). I start the command in the evening and when I come in the morning, output.txt is sometimes empty and the process is not running any more.

Please, how can I track what is happening? Why and when my command failed? I know that pipeline is working, because sometimes it just finishes successfully.

Thanks a lot!

UPDATE: I now think that my process was killed because the server when I am running this computations is recommended for interactive use only. If this is true, only thing I can see from the logfile is that is was not successful - has not finished.

Is there any way to find out that my process was actually killed? Thanks.

If cats make your day easier to understand keep em. – Tegra Detra Dec 21 '13 at 16:04 — Tegra Detra, Dec 21 '13 at 16:04

score 4 · Answer 1 · answered Oct 19 '13 at 18:10

First step, encapsulate that script in a file rather than run it directly at the terminal (and lose the UUOC award while we're at it).

#!/bin/bash

{
awk 'NR%4 == 2 { print $1 }' file.txt | sort | uniq -c | sort -gr >>output.txt
} 2>error.log

This captures all the error messages in the file error.log. Then you can add diagnostic information.

#!/bin/bash

{
date >&2
set -x
awk 'NR%4 == 2 { print $1 }' file.txt | sort | uniq -c | sort -gr >>output.txt
date >&2
} 2>error.log

Now you've got the when information — when it was started and when it finished. Since you're in bash, you could arrange to capture the exit statuses of each process in the pipeline if you wanted to, so you'd know exactly which commands exited with which status. You may or may not get messages about which process was killed (if a process was killed by an external signal), but if the process dies of its own accord, it should print a message on standard error (that's what it is there for, and why it is crucial that errors are printed to standard error, not standard output).

With this script, you have standard output going to output.txt and standard error going to error.log; the script does not use standard input (the data comes from file.txt). So, you can run this with nohup or simply in the background with & without any qualms.

You may prefer to make the name file.txt into command line parameters; you may want to make the output and log files configurable. You might prefer a different format for the date output. All of that is tweakable. But the key point is to put it into a shell script so that you can handle such things straight-forwardly and systematically.

Thanks, looks definitely like a good idea to put it into script like this. When combining with @damienfrancois suggestions might become very powerfull. Thank you both! — Perlnika, Oct 19 '13 at 20:12
@Perlnika To capture the exit status of each command in the pipeline, use the PIPESTATUS bash variable. — Austin Phillips, Oct 20 '13 at 22:36

damienfrancois · Answer 2 · 2013-10-19T20:04:20.880

Use screen, pv and tee to capture all errors, have a progress bar, and allow restarting from the last successful command rather than from scratch upon error.

You can use screen (the multiplexer) rather than backgrounding your process. That way, you can always review its status and not miss error messages. Just type screen, run your command without the &, and hit CTRL-a,d. You can then logout. To later review the output, login (even remotely) and type screen -r.

Furhtermore, if you replace the initial cat with pv (the Pipe Viewer), you will have a progress bar telling you how much has been processed already:

pv -cN file.txt

and you will see something like

611MB 0:00:11 [58.3MB/s] [=>      ] 15% ETA 0:00:59

whenever you reattach the screen process.

Alternatively/complementarily, you can insert tee after a command to copy its output to a file before it is propagated to the next one.

uniq -c | tee afteruniq.tmpfile | sort -gr

The file afteruniq.tmpfile will contain the result of uniq -c. So you know what worked and what failed. You furthermore can resume the chain after the last successful step from the tee'd file.

Not 100% your question, but `script` is sometimes handy too. I use it sometimes with `screen` together. — jerik, Oct 20 '13 at 10:31

anubhava · Answer 3 · 2013-10-19T18:04:59.143

0

You need to redirect both stderr and stdout to your output file.

Make your command like this:

( awk 'NR%4 == 2{print $1}' file.txt| sort | uniq -c | sort -gr ) >> output.txt 2>&1 &

Take note of 2>&1 to redirect stderr to wherever stdout is going.

edited Oct 19 '13 at 18:04

answered Oct 19 '13 at 17:57

anubhava

761,203
64
569
643

Thanks. Will I find there also some message if my process gets killed? Because in this pipeline output.txt is usually empty unless the very end of it's run. – Perlnika Oct 19 '13 at 18:08
It will capture all stdout and stderr in `output.txt` now. – anubhava Oct 19 '13 at 18:14

How to track bash script - exit status and why it failed

3 Answers3