2

Can someone explain what I'm doing wrong and how to do it better.

I have a file consisting of records with field separator "-" and record separator "\t" (tab). I want to put each record on a line, followed by the line number, separated by a tab. The input file is called foo.txt.

$ cat foo.txt
a-b-c   e-f-g   x-y-z
$ < foo.txt tr -cd "\t" | wc -c
2
$ wc foo.txt
 1  3 18 foo.txt

My awk script is in the file foo.awk

BEGIN { RS = "\t" ; FS = "-" ; OFS = "\t" }
{
    print $1 "-" $2 "-" $3, NR
}

And here is what I get when I run it:

$ gawk -f foo.awk foo.txt
a-b-c   1
e-f-g   2
x-y-z
    3

The last record is directly followed by a newline, a tab, and the last number. What is going on?

3 Answers3

1

There is an newline character at the end of your data that is also output when printing $3.

In particular, it looks like this:

$1 = "x"
$2 = "y"
$3 = "z\n"

You can remove the trailing separator with tr before passing everything to awk:

 tr -d '\n' < foo.txt | awk -f foo.awk

or alternatively add \n to the list of field separators (as shown in the answer by Kent), since awk will strip any separators from the fields.

martin
  • 3,149
  • 1
  • 24
  • 35
1

well I don't know your exact goal, but since you have built the thing with awk, you can just add \n to FS to reach your goal to remove the trailing \n and without starting another process, like tr, sed or awk

BEGIN { RS = "\t" ; FS = "-|\n" ; OFS = "\t" }
Kent
  • 189,393
  • 32
  • 233
  • 301
  • 1
    Yes, that is a nice solution. – martin Aug 12 '14 at 10:27
  • What I was doing was a bit more involved (filter some records depending on the value in one of the fields), and it seems that the easiest would have probably been to first make all tabs into new lines and then invoke awk on it. Thank you for the explanation and the hint. –  Aug 12 '14 at 10:31
0
awk 'BEGIN { RS = "\t"; FS = OFS = "-" } { sub(/\n/, ""); print $0 "\t" NR }' file

Output:

a-b-c   1
e-f-g   2
x-y-z   3
  • ORS = "\n" was not necessary.

And with GNU Awk or Mawk, you can just have RS = "[\t\n]+":

awk 'BEGIN { RS = "[\t\n]+"; FS = OFS = "-" } { print $0 "\t" NR }' file
konsolebox
  • 72,135
  • 12
  • 99
  • 105