awk sort multidimensional array

Question

GNU awk supports multidimensional arrays:

q[1][1] = "dog"
q[1][2] = 999
q[2][1] = "mouse"
q[2][2] = 777
q[3][1] = "bird"
q[3][2] = 888

I would like to sort the "second column" of q such that I am left with:

q[1][1] = "mouse"
q[1][2] = 777
q[2][1] = "bird"
q[2][2] = 888
q[3][1] = "dog"
q[3][2] = 999

as you can see the "first column" values moved to keep with the second. I see GNU Awk offers an asort function but it does not appear to support multidimensional arrays. If it helps, this is a working Ruby example:

q = [["dog", 999], ["mouse", 777], ["bird", 888]]
q.sort_by{|z|z[1]}
=> [["mouse", 777], ["bird", 888], ["dog", 999]]

I ended up using a regular array, then separating duplicates with newlines:

q[777] = "mouse"
q[999] = "dog" RS "fish"
q[888] = "bird"
for (z in q) {
  print q[z]
}

AFAIK, there is no language provides build-in method to sort multi-D arrays as user wished. (let me know if some language supports) (g)awk either. So you should get your hands dirty and write your own logic. like you said, gawk provide `asort()` the last argument could be user-defined comparison function. You are free to write your own logic. apart from that, you can turn the 2-d array into 1-d array and use the buildin asort(). what have you tried so far? — Kent, Jul 17 '13 at 08:21
The original question seemed to be about sorting multi-dimensional arrays and had an accepted answer (http://stackoverflow.com/a/17706399/1745001) for over 3 years but the OP recently unaccepted that answer, posted their own answer which was just about populating a one-dimensional array and then updated their question to say that is what they really wanted so this is now simply a dup of every other question about populating an associative array and should just be closed as it no longer makes any sense stand-alone and certainly not in reference to multi-dimensional arrays. — Ed Morton, Jan 04 '17 at 21:43

Ed Morton · Answer 1 · 2013-07-18T13:32:33.710

FWIW, here's a workaround "sort_by()" function:

$ cat tst.awk
BEGIN {
    a[1][1] = "dog"
    a[1][2] = 999
    a[2][1] = "mouse"
    a[2][2] = 777
    a[3][1] = "bird"
    a[3][2] = 888

    print "\n############################\nBefore:"
    for (i=1; i in a; i++)
        for (j=1; j in a[i]; j++)
            printf "a[%d][%d] = %s\n",i,j,a[i][j]
    print "############################"

    sort_by(a,2)

    print "\n############################\nAfter:"
    for (i=1; i in a; i++)
        for (j=1; j in a[i]; j++)
            printf "a[%d][%d] = %s\n",i,j,a[i][j]
    print "############################"

}

function sort_by(arr,key,       keys,vals,i,j)
{
    for (i=1; i in arr; i++) {
        keys[i] = arr[i][key]
        for (j=1; j in arr[i]; j++)
            vals[keys[i]] = vals[keys[i]] (j==1?"":SUBSEP) arr[i][j]
    }

    asort(keys)

    for (i=1; i in keys; i++)
       split(vals[keys[i]],arr[i],SUBSEP)

    return (i - 1)
}

$ gawk -f tst.awk

############################
Before:
a[1][1] = dog
a[1][2] = 999
a[2][1] = mouse
a[2][2] = 777
a[3][1] = bird
a[3][2] = 888
############################

############################
After:
a[1][1] = mouse
a[1][2] = 777
a[2][1] = bird
a[2][2] = 888
a[3][1] = dog
a[3][2] = 999
############################

It works by first converting this:

    a[1][1] = "dog"
    a[1][2] = 999
    a[2][1] = "mouse"
    a[2][2] = 777
    a[3][1] = "bird"
    a[3][2] = 888

to this:

    keys[1]   = 999
    vals[999] = dog SUBSEP 999

    keys[2]   = 777
    vals[777] = mouse SUBSEP 777

    keys[3]   = 888
    vals[888] = bird SUBSEP 888

then asort()ing keys[] to get:

    keys[1] = 777
    keys[2] = 888
    keys[3] = 999

and then looping through the keys array using it's elements as the indices to the vals array for re-populating the original array.

In case anyone's wondering why I didn't just use the values we want to sort on as indices and then do an asorti() as that would have resulted in slightly briefer code, here's why:

$ cat tst.awk
BEGIN {
   a[1] = 888
   a[2] = 9
   a[3] = 777

   b[888]
   b[9]
   b[777]

   print "\n\"a[]\" sorted by content:"
   asort(a,A)
   for (i=1; i in A; i++)
      print "\t" A[i]

   print "\n\"b[]\" sorted by index:"
   asorti(b,B)
   for (i=1; i in B; i++)
      print "\t" B[i]

}
$ awk -f tst.awk

"a[]" sorted by content:
        9
        777
        888

"b[]" sorted by index:
        777
        888
        9

Notice that asorti() treats "9" as a higher value than "888". That's because asorti() sorts on array indices and all array indices are strings (even if they look like numbers) and alphabetically the first character of the string "9" IS higher than the first character of the string "888". asort() on the other hand sorts on the contents of the array, and array contents can be strings OR numbers and so normal awk comparison rules apply - anything that looks like a number is treated like a number and the number 9 is less than the number 888 which in this case IMHO is the desired result.

score 1 · Answer 2 · answered Jul 17 '13 at 06:48

supports true multidimensional arrays

No, it doesn't. It supports arrays of arrays, and it supports a hash indexed by a string consisting of two indices smushed together. Your syntax is the former (arrays of arrays).

That said, I don't think you can do it with builtins, since it would either require a use of a comparator callback, or alternately an ability to return a sort permutation, neither of which gawk provides, AFAIK.

But you can refer to this page which describes how to implement qsort for yourself, where you can change the comparison from A[i] < A[left] to A[i][2] < A[left][2].

awk sort multidimensional array

2 Answers2

Linked