0

When I count the number of lines in a file using awk:

cat ~/.account | wc -l

... the result is:

384

But when I use awk:

awk 'BEGIN {x = "1.02"; y = 0; } {x = x*2; y = y + 1} END {print x; print y}' ~/.account

... the result is:

8.03800926406447389928897056654e+115

385

Why is this?

halfer
  • 19,824
  • 17
  • 99
  • 186
Duc Chi
  • 391
  • 4
  • 8
  • Is the last character of the file `~/.account` a newline character? – e0k Jan 28 '16 at 04:11
  • 1
    reduce your input file to the smallest number of lines that can reproduce the problem. At that point you will have answered the question for yourself. You do not need `y` - it is provided for you as `NR`. – Ed Morton Jan 28 '16 at 04:16
  • Hi e0k, how can I know the last character of the file ~/.account is a newline character or not? – Duc Chi Jan 28 '16 at 04:43
  • Thank Ed Morton. I will use NR later :) – Duc Chi Jan 28 '16 at 04:44

1 Answers1

2

What wc -l is doing

From man wc:

-l, --lines

print the newline counts

Using wc -l counts the number of newline characters and awk separates the input into records separated by newline characters.

Consider this example:

$ echo 1 | wc -l
1
$ echo -n 1 | wc -l
0

The input for the first command (echo 1 ) is the string "1\n". Using -n with echo echos the 1 without a newline at the end, which makes the input just the string "1". The wc -l counts the newline characters in the input. In the first case, there is one newline and in the second there are none.

What AWK is doing

AWK divides its input into records, and each record into fields. This is an important part of the parsing magic that AWK does for us.

From The GNU AWK User's Guide (but referring to standard AWK):

Records are separated by a character called the record separator. By default, the record separator is the newline character. This is why records are, by default, single lines.

But if the input ends with this separator, see what happens:

$ echo 1 | awk 'END{print NR}'
1
$ echo -n 1 | awk 'END{print NR}'
1

(NR is a special variable for "the total number of input records read so far from all data files.")

There is only one record in each case, even the first ("1\n") that contains a newline character. Since there is nothing after the separator, it separates nothing. In other words, it does not give an empty record at the end if the input ends with the separator.

If your input file does not end in a newline character, wc -l will report one less than awk's number of records (NR).

e0k
  • 6,961
  • 2
  • 23
  • 30
  • Thank you so much. I understand your examples. So 'wc -l' counts number of new line character, it depends on the input file. But 'awk' counts number of newline-delimited records encountered that 'awk' used to separate the input file. It doesn't depend on the input file. Does you mean that? – Duc Chi Jan 28 '16 at 04:56
  • Well, both depend on the input file. (Giving different input can give different output.) The idea to understand is that AWK separates input into records for processing. If a separator (newline by default) appears at the very end of the input, it does not make an empty record to be processed. The `wc -l` just counts characters. – e0k Jan 28 '16 at 05:46
  • Oh I see. Thank you! – Duc Chi Jan 28 '16 at 06:01