2

my CSV data looks like this:

Indicator;Country;Value
no_of_people;USA;500
no_of_people;Germany;300
no_of_people;France;200
area_in_km;USA;18
area_in_km;Germany;16
area_in_km;France;17
proportion_males;USA;5.3
proportion_males;Germany;7.9
proportion_males;France;2.4

I want my data to look like this:

Country;no_of_people;area_in_km;proportion_males
USA;500;18;5.3
Germany;300;16;7.9
France;200;17;2.4

There are more Indicators and more countries than listed here.

Pretty large files (number of rows something with 5 digits). Looked around for some transpose threads, but nothing matched my situation (also I'm quite new to awk, so I couldn't change the code I found to fit my data).

Thanks for your help. Regards Ad

user3657346
  • 23
  • 1
  • 4

2 Answers2

2

If the number of Ind fields is fixed, you can do:

awk 'BEGIN{FS=OFS=";"}
     {a[$2,$1]=$3; count[$2]}
     END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]}' file

Explanation

  • BEGIN{FS=OFS=";"} set input and output field separator as semicolon.
  • {a[$2,$1]=$3; count[$2]} get list of countries in count[] array and values of each Ind on a["country","Ind"] array.
  • END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]} print the summary of the values.

Output

$ awk 'BEGIN{FS=OFS=";"} {a[$2,$1]=$3; count[$2]} END {for (i in count) print i, a[i,"Ind1"], a[i, "Ind2"], a[i, "Ind3"]}' file
France;200;17;2.4
Germany;300;16;7.9
USA;500;18;5.3

Update

unfortunately, the number of Indicators is not fixed. Also, they are not named like "Ind1", "Ind2" etc. but are just strings.' I clarified my question.

$ awk -v FS=";" '{a[$2,$1]=$3; count[$2]; indic[$1]} END {for (j in indic) printf "%s ", j; printf "\n"; for (i in count) {printf "%s ", i; for (j in indic) printf "%s ", a[i,j]; printf "\n"}}' file
proportion_males no_of_people area_in_km 
France 2.4 200 17 
Germany 7.9 300 16 
USA 5.3 500 18 

To have ; separated, do replace each space with ;:

$ awk -v FS=";" '{a[$2,$1]=$3; count[$2]; indic[$1]} END {for (j in indic) printf "%s ", j; printf "\n"; for (i in count) {printf "%s ", i; for (j in indic) printf "%s ", a[i,j]; printf "\n"}}' file | tr ' ' ';'
proportion_males;no_of_people;area_in_km;
France;2.4;200;17;
Germany;7.9;300;16;
USA;5.3;500;18;
fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • thanks. unfortunately, the number of Indicators is not fixed. Also, they are not named like "Ind1", "Ind2" etc. but are just strings.' I clarified my question. – user3657346 May 20 '14 at 16:19
  • Oh I see. Then we have to keep track of the indicators. Check my updated answer! – fedorqui May 20 '14 at 16:28
1

Using awk and maintaining the order of output:

awk -F\; '
NR>1 { 
    if(!($1 in indicators)) { indicator[++types] = $1 }; indicators[$1]++  
    if(!($2 in countries)) { country[++num] = $2 }; countries[$2]++
    map[$1,$2] = $3 
}
END {
    printf "%s;" ,"Country";
    for(ind=1; ind<=types; ind++) {
        printf "%s%s", sep, indicator[ind]; 
        sep = ";"
    }
    print "";
    for(coun=1; coun<=num; coun++) {
        printf "%s", country[coun]
        for(val=1; val<=types; val++) {
            printf "%s%s", sep, map[indicator[val], country[coun]];
        }
        print ""
    }
}' file
Country;no_of_people;area_in_km;proportion_males
USA;500;18;5.3
Germany;300;16;7.9
France;200;17;2.4
jaypal singh
  • 74,723
  • 23
  • 102
  • 147