5

I would like to merge multiple tables by row names. The tables differ in the amount of rows and they have unique and shared rows, which should all appear in output. If possible I would like to solve the problem with awk, but I am also fine with other solutions.

table1.tab

a 5
b 5
d 9

table2.tab

a 1
b 2
c 8
e 11

The output I would like to obtain the following table:

table3.tab

a 5 1
b 5 2
d 9 0
c 0 8
e 0 11

I tried using join

join table1.tab table2.tab > table3.tab

but I get

table3.tab

a 5 1
b 5 2

row c, d and e are not in the output.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
user2715173
  • 53
  • 1
  • 3

2 Answers2

7

You want to do a full outer join:

join -a1 -a2 -o 0 1.2 2.2 -e "0" table1.tab table2.tab

a 5 1
b 5 2
c 0 8
d 9 0
e 0 11
Clayton Stanley
  • 7,513
  • 9
  • 32
  • 46
2

this awk oneliner should work for your example:

awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}
END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}' table1 table2

test

kent$  head f1 f2
==> f1 <==
a 5
b 5
d 9

==> f2 <==
a 1
b 2
c 8
e 11

kent$  awk 'NR==FNR{a[$1]=$2;k[$1];next}{b[$1]=$2;k[$1]}END{for(x in k)printf"%s %d %d\n",x,a[x],b[x]}'  f1 f2
a 5 1
b 5 2
c 0 8
d 9 0
e 0 11
Kent
  • 189,393
  • 32
  • 233
  • 301
  • Thanks a lot for the answer! Works very well to solve the given problem! Would awk still be applicable to merge more than 2 tables in that way? What is the best approach to modify the one-liner to add additional columns derived from n (e.g. 20) tables? – user2715173 Aug 25 '13 at 11:46
  • @user2715173 gawk has ARGIND variable, it is useful to handle multi input files, particularly >3 case. I would suggest using "arrays of arrays" to do the n tables case. you can find some detail here:http://www.gnu.org/software/gawk/manual/html_node/Arrays-of-Arrays.html – Kent Aug 25 '13 at 12:23