4

In an attempt to solve a question, I wrote the following gnu-awk script and ran into an issue with sort (should have read the manual first).

From the manual:

Because IGNORECASE affects string comparisons, the value of IGNORECASE also affects sorting for both asort() and asorti(). Note also that the locale's sorting order does not come into play; comparisons are based on character values only.

This was the proposed solution:

awk '{
    lines[$0]=length($0)
}
END {
    for(line in lines) { tmp[lines[line],line] = line }
    n = asorti(tmp)
    for(i=1; i<=n; i++) {
        split(tmp[i], tmp2, SUBSEP); 
        ind[++j] = tmp2[2]
    }
    for(i=n; i>0; i--)
        print ind[i],lines[ind[i]]
}' file
aaaaa foo 9
aaa foooo 9
aaaa foo 8
aaa foo 7
as foo 6
a foo 5
aaaaaaa foooo 13

I tried adding 0 to force numeric type, however wasn't able to reach the desired output. Is there a way we can simulate numeric sort in awk/gawk?

Input File:

aaa foooo
aaaaaaa foooo
a foo
aaa foo
aaaaa foo
as foo
aaaa foo

Desired Output:

aaaaaaa foooo
aaaaa foo     # Doesnt matter which one comes first (since both are same size)
aaa foooo     # Doesnt matter which one comes first (since both are same size)
aaaa foo
aaa foo
as foo
a foo

The numbers shows in the script output is just for illustration on how sorting was done.

Community
  • 1
  • 1
jaypal singh
  • 74,723
  • 23
  • 102
  • 147

2 Answers2

18

see this example, Jaypal, you will get:

kent$  cat f
3333333
50
100
25
44

kent$  awk '{a[$0]}END{asorti(a,b);for(i=1;i<=NR;i++)print b[i]}' f          
100
25
3333333
44
50

kent$  awk '{a[$0]}END{asorti(a,b,"@val_num_asc");for(i=1;i<=NR;i++)print b[i]}' f
25
44
50
100
3333333
Kent
  • 189,393
  • 32
  • 233
  • 301
  • +1: Thank you Kent. I should have looked at the `gawk` manual better. This helps resolve the issue. I honestly never knew there were so many options available for sorting. – jaypal singh Mar 26 '14 at 16:50
  • +1 I hadn't noticed that 3rd arg get added to asorti(), it's a useful one! – Ed Morton Mar 26 '14 at 16:53
  • This is very strange, but that last awk gives me the rows in this order: 44,100,333333,50,25. I can't figure out why. – Bruno Bronosky Sep 21 '18 at 14:02
  • @BrunoBronosky perhaps you want to check which gawk version you are using, and if that version support `asorti(a,b,c)` – Kent Sep 21 '18 at 16:23
  • I does sort numeric using your example file, but it does not work when input file have bigger numbers, do you know why? For example following numbers are not sorted numerically: 5695669340 4690291506 511687106 515604480 632193024 2124447744 – Chris Apr 20 '20 at 09:55
10

The problem you're having is that you're calling asorti() which sorts on array indices and by definition all awk array indices are strings and therefore the sorting is string-based. You can pad with some number of leading zeros using str=sprintf("%20s",num); gsub(/ /,0,str) for example so every string is the same length (e.g. 001, 010 and 100 instead of 1, 10, 100) or use and sort on array elements via asort() instead of indices using asorti() since array elements can be either strings or numbers.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185