59
  1. I am using AWK script to process some logs.
  2. At one place I need to check if the variable value is null or empty to make some decision.

Any Idea how to achieve the same?

awk '

{
    {
       split($i, keyVal, "@")
       key=keyVal[1];
       val=keyVal[2];
       if(val ~ /^ *$/)
       val="Y";

    }

}

' File

I have tried with

1) if(val == "")

2) if(val ~ /^ *$/)

not working in both cases.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
samarth
  • 3,866
  • 7
  • 45
  • 60
  • Could you provide some sample data? – Levon Aug 14 '12 at 13:10
  • your second evaluation would only equate to true if the string had at least one space and only a space character, for that you could have used `if(val ~ /^(\s*)?$/)` \s matches all space type characters (tab, space, null, newline etc), the ? makes it lazy so it'll match if the string is completely empty too – pythonian29033 Oct 28 '20 at 04:12

4 Answers4

70

The comparison with "" should have worked, so that's a bit odd

As one more alternative, you could use the length() function, if zero, your variable is null/empty. E.g.,

if (length(val) == 0)

Also, perhaps the built-in variable NF (number of fields) could come in handy? Since we don't have access to your input data it's hard to say though, but another possibility.

Levon
  • 138,105
  • 33
  • 200
  • 191
  • Watch out, `length(val)` may not be portable: https://stackoverflow.com/questions/20075845/how-to-check-if-awk-array-is-empty#comment29910443_20075924 – ericek111 Apr 06 '21 at 18:48
  • @ericek111 : yeah it's "not portable" in the sense if someone is still writing for solaris `awk` from 15 years ago… there actually *is* a safe-and-efficient way to measure `array` length without wasting time counting one at a time, while auto-detecting `awk`s that require the slow approach, and runs that branch instead. – RARE Kpop Manifesto Nov 04 '22 at 19:42
21

You can directly use the variable without comparison, an empty/null/zero value is considered false, everything else is true.

See here :

# setting default tag if not provided
if (! tag) {
        tag="default-tag"
}

So this script will have the variable tag with the value default-tag except if the user call it like this :

$ awk -v tag=custom-tag -f script.awk targetFile

This is true as of : GNU Awk 4.1.3, API: 1.1 (GNU MPFR 3.1.4, GNU MP 6.1.0)

Adrien H
  • 643
  • 6
  • 21
15

It works just fine for me

$ awk 'BEGIN{if(val==""){print "null or empty"}}'
null or empty

You can't differentiate between variable being empty and null, when you access "unset" variable, awk just initializes it with default value(here it is "" - empty string). You can use some sort of workaround, for example, setting val_accessed variable to 0 and then to 1 when you access it. Or more simple approach(somewhat "hackish") setting val to "unitialized"(or to some other value which can't appear when running your program).

PS: your script looks strange for me, what are the nested brackets for?

Alexander Putilin
  • 2,262
  • 2
  • 19
  • 32
1

I accidentally discovered this less-used function specific in gawk that could help differentiate :

****** gawk-only ******

BEGIN {
    $0 = "abc"

    print NF, $0

    test_function()
    test_function($(NF + 1))

    test_function("")
    test_function($0)
}
function test_function(_) { print typeof(_) }

1 abc

untyped
unassigned

string
string

So it seems, for non-numeric-like data :

  absolutely no input to function at all         :    untyped
  non-existent or empty field,      including $0 : unassigned
  any non-numeric-appearing string, including "" :     string

Here's the chaotic part - numeric data :

  • strangely enough, for absolutely identical input, only differing between using $0 vs. $1 in function call, you frequently get a different value for typeof()

  • even a combination of both leading and trailing spaces doesn't prevent gawk from identifying it as strnum

[123]:NF:1
                 $0 =  number:123          $1 =  strnum:123           +$1 =  number:123
[ 456.33]:NF:1
                 $0 =  string: 456.33      $1 =  strnum:456.33        +$1 =  number:456.33000
[ 19683 ]:NF:1
                 $0 =  string: 19683       $1 =  strnum:19683         +$1 =  number:19683

[-20.08554]:NF:1
                 $0 =  number:-20.08554    $1 =  strnum:-20.08554     +$1 =  number:-20.08554

+/- inf/nan (same for all 4):

[-nan]:NF:1

                 $0 =  string:-nan         $1 =  strnum:-nan          +$1 =  number:-nan

this one is a string because it was made from sprintf() :

[0x10FFFF]:NF:1
               $0 = string:0x10FFFF $1 = string:0x10FFFF +$1 = number:0

using -n / --non-decimal-data flag, all stays same except

[0x10FFFF]:NF:1
               $0 = string:0x10FFFF $1 = strnum:0x10FFFF +$1 = number:1114111

Long story short, if you want your gawk function to be able to differentiate between

  • empty-string input (""), versus

  • actually no input at all

    • e.g. when original intention is to directly apply changes to $0

then typeof(x) == "untyped" seems to be the most reliable indicator.

It gets worse when null-string padding versus a non-empty string of all zeros ::

function   __(_) { return (!_) ":" (!+_)  }
function  ___(_) { return (_ == "")       }
function ____(_) { return (!_) ":" (!""_) }


     $0--->[           "000" ] | __(""$0)-->{ !(""$0) : !+(""$0) }-->[ 0:1 ] 
 ___($0)-->{ $0=="" }-->[ 0 ]  | ____($0)-->{ !   $0  :  (!""$0) }-->[ 1:1000 ]


     $0--->[              "" ] | __(""$0)-->{ !(""$0) : !+(""$0) }-->[ 1:1 ] 
 ___($0)-->{ $0=="" }-->[ 1 ]  | ____($0)-->{ !   $0  :  (!""$0) }-->[ 1:1 ]


     $0--->[      " -0.0 -0" ] | __(""$0)-->{ !(""$0) : !+(""$0) }-->[ 0:1 ] 
 ___($0)-->{ $0=="" }-->[ 0 ]  | ____($0)-->{ !   $0  :  (!""$0) }-->[ 0:1 -0.0 -0 ]


     $0--->[          " 0x5" ] | __(""$0)-->{ !(""$0) : !+(""$0) }-->[ 0:1 ] 
 ___($0)-->{ $0=="" }-->[ 0 ]  | ____($0)-->{ !   $0  :  (!""$0) }-->[ 0:1 0x5 ]
RARE Kpop Manifesto
  • 2,453
  • 3
  • 11