Printing verbose progress from sed and awk

Question

The programs sed and awk usually does their work quietly. Is there any way to get these programs to state what they are doing?

score 9 · Answer 1 · answered Dec 04 '12 at 10:30

9

This is based on potong's answer. The following code replaces 'll' with 'zz', creates a backup file, displays the new text, and writes the change(s) into the file.

$ echo hello > test
$ sed -e 's/ll/zz/;w /dev/stdout' -i .backup test
hezzo
$ cat test
hezzo
$ cat test.backup 
hello

answered Dec 04 '12 at 10:30

Paul

1,874
1
19
26

6

In newer sed versions, if the semicolon before the w is omitted, the w becomes an argument to the "s" sed command and only the changes are written to /dev/stdout, which to me is more useful than writing the entire file – Jack Jan 31 '17 at 19:41
1

2020 : g must be added for a global replace -> sed -i 's/oldtext/newtext/gw /dev/stdout ' – MarcoZen Jun 17 '20 at 15:27

score 6 · Answer 2 · answered Jul 17 '12 at 21:47

6

On the assumption that you are piping your sed output to a file, you could use the tail command (in another terminal) to constantly look at the end of file; such that you could see the progress.

tail -f output_from_sed.txt

answered Jul 17 '12 at 21:47

amasmiller

335
2
7

Eduardo Ivanec · Answer 3 · 2012-03-23T04:02:17.337

5

If you're redirecting the output of sed or awk to a file (instead of modifying files in-place) you can give pv ("pipe viewer") a shot:

sed -e '...' input.txt | pv > output.txt

You can use pv -l to make it report the progress in lines written. The progress status gets printed to stderr while the actual data cruises along from stdin to stdout.

edited Mar 23 '12 at 04:02

answered Mar 23 '12 at 03:55

Eduardo Ivanec

11,668
2
39
42

score 5 · Answer 4 · edited Oct 31 '18 at 16:50

5

This might work for you (for sed):

sed -i 's/foo/bar/;w /dev/stdout' files*

This will print the contents of the file after applying the change.

edited Oct 31 '18 at 16:50

Luc

5,339
2
48
48

answered Mar 23 '12 at 11:41

potong

55,640
6
51
83

For me the -i in the original command doesn't work on OS X. Is it missing -e? – Paul Dec 04 '12 at 10:24
Comment by Paul: Worked example using sed for OS X: `$ echo hello > test $ sed -e 's/ll/zz/;w /dev/stdout' -i .backup test hezzo $ cat test hezzo $ cat test.backup hello` – StuartLC Dec 04 '12 at 10:24
1

Paul: Mac OS X uses the BSD version of sed, which works differently in some ways to the GNU version commonly found on Linux systems. With BSD sed you must always specify *some* extension for -i, while GNU sed interprets nothing as an empty string. So the BSD sed equivalent of GNU sed's `sed -i` is simply `sed -i ''`. StuartLC's examples work the same on either version because he isn't editing files in place (ie. he is using non-zero-length extensions, so doesn't run into this small difference). – robo Dec 19 '13 at 16:37

score 4 · Answer 5 · answered Mar 23 '12 at 04:00

You can always tell awk to print the input record, i.e.

 awk '{ 
       print "#dbg:$0="$0 
       # do more stuff
       print $1
       # or make it conditional
       if ($0 ~ /specialRegEx/){
              print "#dbg:$0="$0 
       }
      }' infile

With sed, you use the 'p' cmd to print each line, although, the default is to print each line. Something like

 sed 'p
      # also "=" prints line # being processed
      =
      /specialRegEx/{
        s/xxx/yyy/
        p
      }' infile

I hope this helps.

score 3 · Answer 6 · answered Mar 24 '12 at 18:39

If you are on Linux, you can view the progress of process that is processing a large files by looking into /proc/<pid>/fdinfo. There is an entry there for each open file descriptor, and if you cat the entries, they show you the read/write position of the file descriptor. So you can see that you are 1123456 bytes into the file. The path names of open files are in another area: /proc/<pid>/fd, represented as symlinks.

Before I look at that, I usually attach an strace to the process: strace -p <pid>. You can use that to watch what system calls the process is making: file reads and writes, and memory allocations with brk or mmap.

For convenience: `ps ax | grep foo` where `foo` is the name of the program in question can be used to find the process id for use above. — David Kuhta, Oct 28 '17 at 02:36

GuruM · Answer 7 · 2012-08-01T09:10:59.550

2

This may not be exactly what you're looking for but it may help someone else. FWIW:
gawk -W dump-variables=/tmp/awk.log
will dump the variable values at end of the script to the log file.

edited Aug 01 '12 at 09:10

answered Aug 01 '12 at 07:10

GuruM

865
11
20

score 2 · Answer 8 · answered Feb 09 '17 at 15:28

The "right" answer here is

pv myfile.txt | sed ...

Eduardo Ivanec's answer was close, but by using the pipe viewer (pv) to do the actual piping, you get to know what your progress is in the file (as a percentage, including great stats like M/sec, total data, etc.).

pv works like cat (read the file and export it directly to stdout, or in the case of piping in, it's a bridge between stdin and stdout).

Importantly, since pv is a "transparent pipe process", stdout is occupied with relaying data. So the progress report is reported through stderr.

Note that this does not work with in-place replacements... (AFAIK) — starryknight64, Jun 13 '17 at 21:16

score 0 · Answer 9 · answered Aug 22 '18 at 01:29

`awk` output to `/dev/stderr`

I sometimes process large data files that involve blocks of 4 lines (FASTQ), so I often use stderr to output status messages at regular intervals (every 100K lines). Here's a basic template:

#!/usr/bin/awk -f

BEGIN {
    # Check for any expected input variables

    # Status
    print "[INFO] Initiating processing..." > "/dev/stderr";
}
{
    # Do stuff

    # Status
    if (NR % 400000 == 0) {
        printf("[INFO] %d reads processed\n", NR/4) > "/dev/stderr";
    }
}
END {
    # Final status
    printf("[INFO] %d total reads\n", NR/4) > "/dev/stderr";
}

Printing verbose progress from sed and awk

9 Answers9

awk output to /dev/stderr

`awk` output to `/dev/stderr`