Pivot table in AWK

Question

I need to transform elements from an array to column index and return the value of $3 for each column index. I don´t have access to gawk 4 so I cannot work with real multidimensional arrays.

Input

Name^Code^Count
Name1^0029^1  
Name1^0038^1   
Name1^0053^1  
Name2^0013^3  
Name2^0018^3  
Name2^0023^5  
Name2^0025^1  
Name2^0029^1  
Name2^0038^1  
Name2^0053^1  
Name3^0018^1  
Name3^0060^1  
Name4^0018^2  
Name4^0025^5  
Name5^0018^2  
Name5^0025^1  
Name5^0060^1

Desired output

Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^  
Name2^3^3^5^1^1^1^1^  
Name3^^1^^^^^^1  
Name4^^2^^5^^^^  
Name5^^^^1^^^^1

Any suggestions on how to tackle this task without using real multidimensional arrays?

BTW - I think that the last row in your desired output is wrong. It should be `Name5^^2^^1^^^^1` — D.Shawley, Apr 05 '14 at 13:12

jaypal singh · Answer 1 · 2014-04-05T16:03:04.213

3

The following solution uses GNU awk v3.2 features for sorting. This does not use multi-dimensional arrays. It only simulates one.

awk -F"^" '
NR>1{
    map[$1,$2] = $3
    name[$1]++
    value[$2]++
}
END{
    printf "Name"
    n = asorti(value, v_s)
    for(i=1; i<=n; i++) {
        printf "%s%s", FS, v_s[i]
    }
    print ""
    m = asorti(name, n_s)
    for(i=1; i<=m; i++) { 
        printf "%s", n_s[i]
        for(j=1; j<=n; j++) { 
            printf "%s%s", FS, map[n_s[i],v_s[j]]
        }
        print ""
    }
}' file
Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^
Name2^3^3^5^1^1^1^1^
Name3^^1^^^^^^1
Name4^^2^^5^^^^
Name5^^2^^1^^^^1

edited Apr 05 '14 at 16:03

answered Apr 05 '14 at 13:53

jaypal singh

74,723
23
102
147

1

+1 Awk has a builtin variable named SUBSEP which is used to separate pseudo-2d array indices when you use a comma between them so instead of `map[$1 FS $2]` you can write the more natural `map[$1,$2]`. Also `((map[n_s[i] FS v_s[j]]) ? map[n_s[i] FS v_s[j]] : "")` can be reduced to simply `map[$1,$2]` since if it's the null string (which is what you're currently testing for), that IS what you want to print anyway. – Ed Morton Apr 05 '14 at 15:44
1

Thanks so much @EdMorton. Great feedback. Appreciate it as always. – jaypal singh Apr 05 '14 at 16:03

Ed Morton · Answer 2 · 2014-04-05T16:49:52.890

This will work with any awk and will order the output of counts numerically while keeping the names in the order they occur in your input file:

$ cat tst.awk
BEGIN{FS="^"}

NR>1 {
    if (!seenNames[$1]++) {
        names[++numNames] = $1
    }

    if (!seenCodes[$2]++) {
        # Insertion Sort - start at the end of the existing array and
        # move everything greater than the current value down one slot
        # leaving open the slot for the current value to be inserted between
        # the last value smaller than it and the first value greater than it.
        for (j=++numCodes;codes[j-1]>$2+0;j--) {
            codes[j] = codes[j-1]
        }
        codes[j] = $2
    }

    count[$1,$2] = $3
}

END {
    printf "%s", "Name"
    for (j=1;j<=numCodes;j++) {
        printf "%s%s",FS,codes[j]
    }
    print ""

    for (i=1;i<=numNames;i++) {
        printf "%s", names[i]
        for (j=1;j<=numCodes;j++) {
            printf "%s%s",FS,count[names[i],codes[j]]
        }
        print ""
    }
}

...

$ awk -f tst.awk file
Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^
Name2^3^3^5^1^1^1^1^
Name3^^1^^^^^^1
Name4^^2^^5^^^^
Name5^^2^^1^^^^1

+1: The way you sort the second column is brilliant. If you get a chance, can you please put some explanation for it. I am still having a hard time following it (sorry!). — jaypal singh, Apr 05 '14 at 16:41
@JS웃 thanks but it's just a plain old Insertion Sort (see http://en.wikipedia.org/wiki/Insertion_sort for a nice graphic of it working). I added a comment. — Ed Morton, Apr 05 '14 at 16:50

D.Shawley · Answer 3 · 2014-04-07T11:41:13.580

1

Since you only have two "dimensions", it is easy enough to use one array for each dimension and a joining array with a calculated column name. I didn't do the sorting of columns or rows, but the idea is pretty basic.

#!/usr/bin/awk -f
#
BEGIN { FS = "^" }
(NR == 1) {next}

{
    rows[$1] = 1
    columns[$2] = 1
    join_table[$1 "-" $2] = $3
}

END {
    printf "Name"
    for (col_name in columns) {
        printf "^%s", col_name
    }
    printf "\n"
    for (row_name in rows) {
        printf row_name
        for (col_name in columns) {
            printf "^%s", join_table[row_name "-" col_name]
        }
        printf "\n"
    }
}

edited Apr 07 '14 at 11:41

answered Apr 05 '14 at 13:09

D.Shawley

58,213
10
98
113

Never use input data in the printf format field as it'll fail cryptically when the input data contains printf formatting characters. So, for example, do `printf "^%s", col_name` instead of `printf "^" col_name`. Also see my comment to JS about SUBSEP. – Ed Morton Apr 05 '14 at 16:26
1

@EdMorton - thanks… I'm not sure why I didn't think about that after seeing quite a few core dumps in C code for similar reasons. – D.Shawley Apr 07 '14 at 11:42

Pivot table in AWK

3 Answers3