3

I need to transform elements from an array to column index and return the value of $3 for each column index. I don´t have access to gawk 4 so I cannot work with real multidimensional arrays.

Input

Name^Code^Count
Name1^0029^1  
Name1^0038^1   
Name1^0053^1  
Name2^0013^3  
Name2^0018^3  
Name2^0023^5  
Name2^0025^1  
Name2^0029^1  
Name2^0038^1  
Name2^0053^1  
Name3^0018^1  
Name3^0060^1  
Name4^0018^2  
Name4^0025^5  
Name5^0018^2  
Name5^0025^1  
Name5^0060^1

Desired output

Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^  
Name2^3^3^5^1^1^1^1^  
Name3^^1^^^^^^1  
Name4^^2^^5^^^^  
Name5^^^^1^^^^1 

Any suggestions on how to tackle this task without using real multidimensional arrays?

Jotne
  • 40,548
  • 12
  • 51
  • 55
eh2deni
  • 69
  • 1
  • 1
  • 4

3 Answers3

3

The following solution uses GNU awk v3.2 features for sorting. This does not use multi-dimensional arrays. It only simulates one.

awk -F"^" '
NR>1{
    map[$1,$2] = $3
    name[$1]++
    value[$2]++
}
END{
    printf "Name"
    n = asorti(value, v_s)
    for(i=1; i<=n; i++) {
        printf "%s%s", FS, v_s[i]
    }
    print ""
    m = asorti(name, n_s)
    for(i=1; i<=m; i++) { 
        printf "%s", n_s[i]
        for(j=1; j<=n; j++) { 
            printf "%s%s", FS, map[n_s[i],v_s[j]]
        }
        print ""
    }
}' file
Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^
Name2^3^3^5^1^1^1^1^
Name3^^1^^^^^^1
Name4^^2^^5^^^^
Name5^^2^^1^^^^1
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • 1
    +1 Awk has a builtin variable named SUBSEP which is used to separate pseudo-2d array indices when you use a comma between them so instead of `map[$1 FS $2]` you can write the more natural `map[$1,$2]`. Also `((map[n_s[i] FS v_s[j]]) ? map[n_s[i] FS v_s[j]] : "")` can be reduced to simply `map[$1,$2]` since if it's the null string (which is what you're currently testing for), that IS what you want to print anyway. – Ed Morton Apr 05 '14 at 15:44
  • 1
    Thanks so much @EdMorton. Great feedback. Appreciate it as always. – jaypal singh Apr 05 '14 at 16:03
3

This will work with any awk and will order the output of counts numerically while keeping the names in the order they occur in your input file:

$ cat tst.awk
BEGIN{FS="^"}

NR>1 {
    if (!seenNames[$1]++) {
        names[++numNames] = $1
    }

    if (!seenCodes[$2]++) {
        # Insertion Sort - start at the end of the existing array and
        # move everything greater than the current value down one slot
        # leaving open the slot for the current value to be inserted between
        # the last value smaller than it and the first value greater than it.
        for (j=++numCodes;codes[j-1]>$2+0;j--) {
            codes[j] = codes[j-1]
        }
        codes[j] = $2
    }

    count[$1,$2] = $3
}

END {
    printf "%s", "Name"
    for (j=1;j<=numCodes;j++) {
        printf "%s%s",FS,codes[j]
    }
    print ""

    for (i=1;i<=numNames;i++) {
        printf "%s", names[i]
        for (j=1;j<=numCodes;j++) {
            printf "%s%s",FS,count[names[i],codes[j]]
        }
        print ""
    }
}

...

$ awk -f tst.awk file
Name^0013^0018^0023^0025^0029^0038^0053^0060
Name1^^^^^1^1^1^
Name2^3^3^5^1^1^1^1^
Name3^^1^^^^^^1
Name4^^2^^5^^^^
Name5^^2^^1^^^^1
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • 1
    +1: The way you sort the second column is brilliant. If you get a chance, can you please put some explanation for it. I am still having a hard time following it (sorry!). – jaypal singh Apr 05 '14 at 16:41
  • 1
    @JS웃 thanks but it's just a plain old Insertion Sort (see http://en.wikipedia.org/wiki/Insertion_sort for a nice graphic of it working). I added a comment. – Ed Morton Apr 05 '14 at 16:50
1

Since you only have two "dimensions", it is easy enough to use one array for each dimension and a joining array with a calculated column name. I didn't do the sorting of columns or rows, but the idea is pretty basic.

#!/usr/bin/awk -f
#
BEGIN { FS = "^" }
(NR == 1) {next}

{
    rows[$1] = 1
    columns[$2] = 1
    join_table[$1 "-" $2] = $3
}

END {
    printf "Name"
    for (col_name in columns) {
        printf "^%s", col_name
    }
    printf "\n"
    for (row_name in rows) {
        printf row_name
        for (col_name in columns) {
            printf "^%s", join_table[row_name "-" col_name]
        }
        printf "\n"
    }
}
D.Shawley
  • 58,213
  • 10
  • 98
  • 113
  • Never use input data in the printf format field as it'll fail cryptically when the input data contains printf formatting characters. So, for example, do `printf "^%s", col_name` instead of `printf "^" col_name`. Also see my comment to JS about SUBSEP. – Ed Morton Apr 05 '14 at 16:26
  • 1
    @EdMorton - thanks… I'm not sure why I didn't think about that after seeing quite a few core dumps in C code for similar reasons. – D.Shawley Apr 07 '14 at 11:42