32

The programs sed and awk usually does their work quietly. Is there any way to get these programs to state what they are doing?

Village
  • 22,513
  • 46
  • 122
  • 163

9 Answers9

9

This is based on potong's answer. The following code replaces 'll' with 'zz', creates a backup file, displays the new text, and writes the change(s) into the file.

$ echo hello > test
$ sed -e 's/ll/zz/;w /dev/stdout' -i .backup test
hezzo
$ cat test
hezzo
$ cat test.backup 
hello
Paul
  • 1,874
  • 1
  • 19
  • 26
  • 6
    In newer sed versions, if the semicolon before the w is omitted, the w becomes an argument to the "s" sed command and only the changes are written to /dev/stdout, which to me is more useful than writing the entire file – Jack Jan 31 '17 at 19:41
  • 1
    2020 : g must be added for a global replace -> sed -i 's/oldtext/newtext/gw /dev/stdout ' – MarcoZen Jun 17 '20 at 15:27
6

On the assumption that you are piping your sed output to a file, you could use the tail command (in another terminal) to constantly look at the end of file; such that you could see the progress.

tail -f output_from_sed.txt
amasmiller
  • 335
  • 2
  • 7
5

If you're redirecting the output of sed or awk to a file (instead of modifying files in-place) you can give pv ("pipe viewer") a shot:

sed -e '...' input.txt | pv > output.txt

You can use pv -l to make it report the progress in lines written. The progress status gets printed to stderr while the actual data cruises along from stdin to stdout.

Eduardo Ivanec
  • 11,668
  • 2
  • 39
  • 42
5

This might work for you (for sed):

sed -i 's/foo/bar/;w /dev/stdout' files*

This will print the contents of the file after applying the change.

Luc
  • 5,339
  • 2
  • 48
  • 48
potong
  • 55,640
  • 6
  • 51
  • 83
  • For me the -i in the original command doesn't work on OS X. Is it missing -e? – Paul Dec 04 '12 at 10:24
  • Comment by Paul: Worked example using sed for OS X: `$ echo hello > test $ sed -e 's/ll/zz/;w /dev/stdout' -i .backup test hezzo $ cat test hezzo $ cat test.backup hello` – StuartLC Dec 04 '12 at 10:24
  • 1
    Paul: Mac OS X uses the BSD version of sed, which works differently in some ways to the GNU version commonly found on Linux systems. With BSD sed you must always specify *some* extension for -i, while GNU sed interprets nothing as an empty string. So the BSD sed equivalent of GNU sed's `sed -i` is simply `sed -i ''`. StuartLC's examples work the same on either version because he isn't editing files in place (ie. he is using non-zero-length extensions, so doesn't run into this small difference). – robo Dec 19 '13 at 16:37
4

You can always tell awk to print the input record, i.e.

 awk '{ 
       print "#dbg:$0="$0 
       # do more stuff
       print $1
       # or make it conditional
       if ($0 ~ /specialRegEx/){
              print "#dbg:$0="$0 
       }
      }' infile

With sed, you use the 'p' cmd to print each line, although, the default is to print each line. Something like

 sed 'p
      # also "=" prints line # being processed
      =
      /specialRegEx/{
        s/xxx/yyy/
        p
      }' infile

I hope this helps.

shellter
  • 36,525
  • 7
  • 83
  • 90
3

If you are on Linux, you can view the progress of process that is processing a large files by looking into /proc/<pid>/fdinfo. There is an entry there for each open file descriptor, and if you cat the entries, they show you the read/write position of the file descriptor. So you can see that you are 1123456 bytes into the file. The path names of open files are in another area: /proc/<pid>/fd, represented as symlinks.

Before I look at that, I usually attach an strace to the process: strace -p <pid>. You can use that to watch what system calls the process is making: file reads and writes, and memory allocations with brk or mmap.

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • For convenience: `ps ax | grep foo` where `foo` is the name of the program in question can be used to find the process id for use above. – David Kuhta Oct 28 '17 at 02:36
2

This may not be exactly what you're looking for but it may help someone else. FWIW:
gawk -W dump-variables=/tmp/awk.log
will dump the variable values at end of the script to the log file.

GuruM
  • 865
  • 11
  • 20
2

The "right" answer here is

pv myfile.txt | sed ...

Eduardo Ivanec's answer was close, but by using the pipe viewer (pv) to do the actual piping, you get to know what your progress is in the file (as a percentage, including great stats like M/sec, total data, etc.).

pv works like cat (read the file and export it directly to stdout, or in the case of piping in, it's a bridge between stdin and stdout).

Importantly, since pv is a "transparent pipe process", stdout is occupied with relaying data. So the progress report is reported through stderr.

Sir Robert
  • 4,686
  • 7
  • 41
  • 57
0

awk output to /dev/stderr

I sometimes process large data files that involve blocks of 4 lines (FASTQ), so I often use stderr to output status messages at regular intervals (every 100K lines). Here's a basic template:

#!/usr/bin/awk -f

BEGIN {
    # Check for any expected input variables

    # Status
    print "[INFO] Initiating processing..." > "/dev/stderr";
}
{
    # Do stuff

    # Status
    if (NR % 400000 == 0) {
        printf("[INFO] %d reads processed\n", NR/4) > "/dev/stderr";
    }
}
END {
    # Final status
    printf("[INFO] %d total reads\n", NR/4) > "/dev/stderr";
}
merv
  • 67,214
  • 13
  • 180
  • 245