4

So, I'm seeing this output and I'm a bit surprised:

$ echo "a,b,c,d,e,f,g" | cut -d, -f-4
a,b,c,d
$ echo "a,b,c,d,e,f,g" | cut -d, -f6-
f,g
echo "a,b,c,d,e,f,g" | awk '{ print $0 | "cut -d, -f-4"; print $0 | "cut -d, -f6-"; }'
f,g
a,b,c,d

(As a side note, I realize this is a completely silly thing to do in awk, but it's the only command I've seen it happen for!).

As I understand it, this should pipe the record into the two commands -- in order. But for some reason, the output appears reversed. If I do this instead

$ echo "a,b,c,d,e,f,g" | awk '{ print $0 | "echo hello"; print $0 | "echo goodbye"; }'
hello
goodbye

then everything comes in the order I expected. I'm thinking this must be some sort of race condition, but I'm surprised that awk doesn't wait for the subcommand in the pipe to finish. Is this a known issue of using awk or something pecular to gawk? Is there any way to avoid such a pitfall?

EDIT:

I tried it using mawk too... same (reversed) result, and seems to happen consistently for both.

FatalError
  • 52,695
  • 14
  • 99
  • 116
  • Testing this using `date`, I had output in both orders. `echo "a,b,c,d,e,f,g" | awk '{ print $0 | "date \"+A%s.%N\""; print $0 | "date \"+B%s.%N\""; }'` – Dennis Williamson Apr 25 '12 at 16:01

2 Answers2

5

In order to ensure that an external command is completed, you must close the command.

$ echo "a,b,c,d,e,f,g" | awk 'BEGIN {cmd1 = "cut -d, -f-4"; cmd2 = "cut -d, -f6-"} { print $0 | cmd1; close(cmd1); print $0 | cmd2; close(cmd2)}'
a,b,c,d
f,g
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
4

I am surprised by this but it's clear that awk runs commands in parallel. Try this:

# time echo "a,b,c,d,e,f,g" | awk '{ print $0 | "sleep 2"; print $0 | "sleep 2"; }'

real    0m2.250s
user    0m0.030s
sys     0m0.060s
Benj
  • 31,668
  • 17
  • 78
  • 127