1
$ awk --version
GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0)
Copyright (C) 1989, 1991-2019 Free Software Foundation.

I run following three similar commands which tries to use $1 and $2 as integers. During that I use sub() in Awk to strip a non-numerical heading character @.

However if sub() operates particularly on $1 instead of the whole $0, the result doesn't get converted to integer afterwards.

Then if sub() doesn't find matches in $1 the conversion goes also fine:

$ echo @101 9 | awk '{sub(/^@/, "", $0); print "("$2" < "$1") is " ($2 < $1)}'
(9 < 101) is 1

$ echo @101 9 | awk '{sub(/^@/, "", $1); print "("$2" < "$1") is " ($2 < $1)}'
(9 < 101) is 0

$ echo  101 9 | awk '{sub(/^@/, "", $1); print "("$2" < "$1") is " ($2 < $1)}'
(9 < 101) is 1

Hence I am not sure about whether is this a bug or the expected behavior. If it's expected, I would like to find out the reason behind that.

I expected the 2nd case to generate result equal to the one from the 1st or the 3rd case.


Update 1:

I added type dumping:

$ cat dump-args.awk

function dump(text) {

    printf text
    printf ", $0 is "typeof($0)
    printf ", $1 is "typeof($1)
    printf ", $2 is "typeof($2)
    print ""
}

$ echo @101 9 | awk '@include "dump-args.awk"; { dump("Initially"); sub(/^@/, "", $0); dump("After sub"); print "("$1" > "$2") is " ($1 > $2)}'
Initially, $0 is string, $1 is string, $2 is strnum
After sub, $0 is string, $1 is strnum, $2 is strnum
(101 > 9) is 1

$ echo @101 9 | awk '@include "dump-args.awk"; { dump("Initially"); sub(/^@/, "", $1); dump("After sub"); print "("$1" > "$2") is " ($1 > $2)}'
Initially, $0 is string, $1 is string, $2 is strnum
After sub, $0 is string, $1 is string, $2 is strnum
(101 > 9) is 0

$ echo  101 9 | awk '@include "dump-args.awk"; { dump("Initially"); sub(/^@/, "", $1); dump("After sub"); print "("$1" > "$2") is " ($1 > $2)}'
Initially, $0 is string, $1 is strnum, $2 is strnum
After sub, $0 is string, $1 is strnum, $2 is strnum
(101 > 9) is 1

Thanks to some comments and this info, it is now more clear when the type of $1 may change and when it get fixed. But...


Update 2:

Most explanations doesn't highlight the following difference which I just found during reduction of the test case:

$ echo @101 9 | awk '{ sub(/^@/, "", $1); print ($1 > $2)}'
0

$ echo  @91 9 | awk '{ sub(/^@/, "", $1); print ($1 > $2)}'
1

The types are just the same as with the @101:

$ echo  @91 9 | awk '@include "dump-args.awk"; { dump("Initially"); sub(/^@/, "", $1); dump("After sub"); print "("$1" > "$2") is " ($1 > $2)}'
Initially, $0 is string, $1 is string, $2 is strnum
After sub, $0 is string, $1 is string, $2 is strnum
(91 > 9) is 1
saulius2
  • 249
  • 4
  • 8
  • @anubhava: what OS and what Awk version did you use for test, please? – saulius2 Mar 30 '22 at 20:08
  • @anubhava: my concern is about why it still works without `+0` if I manipulate the whole `$0`. – saulius2 Mar 30 '22 at 20:09
  • 1
    with `GNU awk 5.1.1` I get the same results as OP; for the 2nd occasion the first time `$1` is referenced it is treated as a string (`@101`) so from here to the end of processing for the line `$1` will be treated as a string; to 'convert' `$1` to a numeric you can do as anubhava suggested ... `$1+0`; in the first case the first time `$1` is referenced is after the `sub()` so `$1` could be treated as a string or numeric, so when we get to the comparison `awk` sees the other side appears to be numeric so `$1` is also treated as numeric – markp-fuso Mar 30 '22 at 20:12
  • I suspect the answer is in here somewhere: https://www.gnu.org/software/gawk/manual/html_node/Typing-and-Comparison.html – glenn jackman Mar 30 '22 at 20:13
  • On BSD (OSX) it worked without `+0` but as I commented earlier you should always use it like: `echo '@101 9' | awk '{sub(/^@/, "", $1); print ($2 < $1+0)}'` to convert string to number – anubhava Mar 30 '22 at 20:15
  • for the 3rd case same issue ... could be string or numeric until we get to the comparison at which point `awk` sees what looks like a numeric on the opposite side of the comparisons so `$1` is treated as a numeric; when in doubt ... `+0` to force a numeric – markp-fuso Mar 30 '22 at 20:15
  • @markp-fuso: thanks for the hints. It's still unclear to me why type of `$1` remains undecided in the 1st case even if I modify the original content of it. – saulius2 Mar 30 '22 at 20:19
  • 1
    In the first case, the `'@'` is removed before `$1` is referenced is the difference between case 1 and case 2. – David C. Rankin Mar 30 '22 at 20:32
  • gnu-awk have an `int()` function. use `int($2) < int($1)` instead of – Jose Ricardo Bustos M. Mar 30 '22 at 20:37
  • 1
    see [gnu awk variable typing](https://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html) for some details; of interest is the `typeof()` function that could be called in multiple locations throughout OP's examples to show what `awk` thinks is `$1's` type (eg, `print typeof($1)`) – markp-fuso Mar 30 '22 at 20:39
  • 1
    Please try to make your title explicit enough someone knows if they have the same question as you without needing to click through and read the body to know what "this" is. – Charles Duffy Mar 30 '22 at 22:43
  • 1
    your 2nd update is actually confirming what we've been saying ... `$1` is being treated as a string; you appear to be suggesting that `91 > 9` returns `1` because this is being treated as a numeric comparison but it's not, it's still a string comparison, ie, *string* `91` *is* greater than the *string* `9` – markp-fuso Mar 31 '22 at 14:33
  • Thanks @markp-fuso, now this makes sense. So if do `81 > 9`, this will return `0`, I guess. – saulius2 Mar 31 '22 at 15:11
  • @markp-fuso, OTOH `9`, the `typeof($2)` is `strnum`. It's not clear from the explanations whether this `strnum` gets converted to a temporary `string` inside the comparison. – saulius2 Mar 31 '22 at 15:17
  • `strnum` means `awk` still hasn't decided how to treat the variable ... string or number? `string` says `awk` is definitely treating the variable as a string, and at that point `awk` is looking at `string` vs `strnum(string or num?)` and decides to process the `strnum` as a `string` (ie, it becomes `string` vs `string`) – markp-fuso Mar 31 '22 at 15:25
  • That's OK but I am interested about the nearest future of the var. The non-decidedness (the type of `$2`) seems to remain the same (the `strnum`) after the value was read and temporary converted to `string` for the sake of comparison. If the value was to be written back (modified), the type would change into `number` or `string` depending on the kind of modification. Right? – saulius2 Mar 31 '22 at 15:39
  • Between a) the link [gnu awk variable typing](https://www.gnu.org/software/gawk/manual/html_node/Variable-Typing.html) and b) the `typeof()` function ... you should have everything you need to run as many test scenarios as you can think of; just keep in mind the variable typing applies for `GNU awk` and may not apply for other flavors of `awk`, ymmv – markp-fuso Mar 31 '22 at 15:51

1 Answers1

1

This behavior is a feature, for example

echo 20 101 9 | awk '{sub(/20/, "", $0); print $1}'

print

101

Because awk recompile the record when $0 is changed, for example

echo 20 101 9 | awk '{sub(/20/, "", $1); print $1}'

Print nothing, because $1 is delete and $1 contains an empty string, this does not recompile the record, in your example $1 is cast as a text or an integer

echo @101 9 | awk '{sub(/^@/, "", $1); print typeof($1)}'
echo @101 9 | awk '{sub(/^@/, "", $0); print typeof($1)}'
echo @101 9 | awk '{sub(/^@/, "", $1); $0=$0; print typeof($1)}'

in the last line $0=$0 recompile the record, this print,

string
strnum
strnum
Jose Ricardo Bustos M.
  • 8,016
  • 6
  • 40
  • 62
  • 1
    I have no idea about how gawk represents stuff underneath, but managed to dump (`gawk -D`) the internal trace of two cases @saulius2 provided. Basically, terminology and the behavior sounds similar: https://gist.github.com/kamiccolo/6ae90b53d2822ae8ddbf6f9a2319273c#file-gistfile1-txt `Op_field_assign : [reset_record()]` vs `Op_field_assign : [invalidate_field0()]` – Kamiccolo Mar 30 '22 at 21:13
  • @Jose, this still doesn't convince me. OK, `$0` gets recompiled and the `$1` type gets changed, but: (1) I don't make `$1` empty and (2) why does the `@91` gets properly converted from string to integer here, in my updated question, with comparison to `@101`: https://stackoverflow.com/questions/71683391/is-this-a-gnu-awk-bug-or-a-feature#:~:text=Update%3A,-Most%20explanations%20doesn%27t – saulius2 Mar 30 '22 at 22:31
  • And if this is really a feature, then why does it behave differently on OSX version of Awk (https://stackoverflow.com/questions/71683391/is-such-type-conversion-of-1-a-bug-or-a-feature-in-gnu-awk#comment126686872_71683391) ? I need some proofs telling why / when it's really a feature and not a coincidence. – saulius2 Mar 30 '22 at 22:54
  • 1
    @saulius2 it seems that OSX is using different (or rather original (?)) implementation of `awk` - `nawk`. At least according to [wiki](https://en.wikipedia.org/wiki/AWK#Versions_and_implementations) and [this](https://stackoverflow.com/a/24334941/1150918) – Kamiccolo Mar 31 '22 at 20:02