31

I need to compare two strings in alphabetic order, not only equality test. I want to know is there way to do string comparison in awk?

codeforester
  • 39,467
  • 16
  • 112
  • 140
Dagang
  • 24,586
  • 26
  • 88
  • 133
  • 1
    Of course you can - it's primarily a string-processing language. –  May 26 '11 at 13:17
  • This is a misconception. For instance the expression `$1 == $2` will falsely report that the strings `001` and `1.0` are equal. – Kaz Aug 15 '21 at 02:35

3 Answers3

35

Sure it can:

pax$ echo 'hello
goodbye' | gawk '{if ($0 == "hello") {print "HELLO"}}'
HELLO

You can also do inequality (ordered) testing as well:

pax> printf 'aaa\naab\naac\naad\n' | gawk '{if ($1 < "aac"){print}}'
aaa
aab
paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
  • The operator < will only compare first letter per my experience. Hence it will not compare strings. You have to use != operator. – Sumod Jul 22 '14 at 09:49
  • @Sumod, then your implementation of `awk` is broken. In any case `!=` is useless for ordering strings as per the question. See the update for string comparison beyond the first character, and I'd suggest switching to using the GNU variant. – paxdiablo Jul 22 '14 at 11:16
  • OK. I am using awk that comes with CentOS 6.4. It says GNU awk 3.1.7. Please see the input of my commands.
    $jps - 29420 Jps, 28009 RunJar, 27501 DseDaemon. If I give the command - jps | awk '{if ($2 < "Jps") {print $2}}', then only DseDaemon is printed. If I use "!=", then both RunJar and DseDaemon are printed. Hence I reached this conclusion. Please excuse typos. Not able to copy paste exact commands.
    – Sumod Jul 22 '14 at 11:34
  • 1
    @Sumod, if your three lines are `29420 Jps`, `28009 RunJar` and `27501 DseDemon`, then it's acting correctly. The `DseDemon` string is the **only** one less than `Jps`. `RunJar` is greater and `Jps` is **equal** so neither of those will print. Try another line containing `11111 Jpr` and see what happens, I think you'll find it prints out fine. If you want to include the `Jps` line in your output, you should be using `<=` rather than `<`. – paxdiablo Jul 22 '14 at 12:48
  • Beware that `awk` has no explicit typing and tries to convert everything to numbers first, which sometime lead to "interesting" results: ``` awk -v a=0200 -v b=02E2 'BEGIN{print(a==b)}' ``` Instead of string comparison you get comparison by numbers, e.g. "02E2" is scientific notation for 02*10²=200 and you get True. You can force string comparison by prefixing some string, which is guaranteed to not be numeric, e.g. ``` awk -v a=0200 -v b=02E2 'BEGIN{print(("x" a)==("x" b))}' ``` – pmhahn Jun 08 '22 at 13:46
6

You can do string comparison in awk using standard boolean operators, unlike in C where you would have to use strcmp().

echo "xxx yyy" > test.txt

cat test.txt | awk '$1!=$2 { print($1 $2); }'

Ilya Matveychikov
  • 3,936
  • 2
  • 27
  • 42
4

You can check the answer in the nawk manual

echo aaa bbb | awk '{ print ($1 >= $2) ? "true" : "false" }'
Denim Datta
  • 3,740
  • 3
  • 27
  • 53
Ian Chang
  • 122
  • 5