-1

Hi I have more than 10 files ( Only two columns ) and i want to join them with row names,

file1   
a   3
b   4
c   6

file2   
c   7
b   33
f   56

file3   
d   4
e   9
f   44
a   99

Output          
    file1   file2   file3
a   3   0   99
b   4   33  0
c   6   7   0
e   0   0   9
d   0   0   4
f   0   56  44

below link answers perfect to my requirement, but it works only for two files

Join multiple tables by row names

How do i do same for multiple (n) files. New to shell commands

Kiran
  • 81
  • 1
  • 8
  • It is always recommended to post 3 simple things in one's post, 1st: sample of input, 2nd: sample of output and 3rd your efforts. So kindly edit your post with more details and let us know then – RavinderSingh13 Oct 02 '19 at 08:40
  • when i trying to post sample input and output format is getting Mess So thought to add the link to previous question. – Kiran Oct 02 '19 at 09:16
  • It is better to post simple samples which are near to your actual Input_files, you can shorten your samples and still post to give us clear picture of your question. – RavinderSingh13 Oct 02 '19 at 09:19
  • added Sample image – Kiran Oct 02 '19 at 09:47
  • 1
    Your sample data needs to be text in the question so it can easily be copied and pasted for testing, not an image. – Shawn Oct 02 '19 at 11:04
  • 1
    Now i could format input and put here – Kiran Oct 02 '19 at 12:13
  • @Kiran your question implies a working solution for 2 tables in the link. However, in your example, the files are not sorted, so plain join will not work. Can you clarify if the data in the file is sorted by the first column ? – dash-o Oct 02 '19 at 12:47

2 Answers2

2

With GNU awk for arrays of arrays and sorted_in:

$ cat tst.awk
BEGIN { OFS="\t" }
{ vals[$1][ARGIND] = $2 }
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"
    printf "%s", OFS
    for (fileNr=1; fileNr<=ARGIND; fileNr++) {
        printf "%s%s", ARGV[fileNr], (fileNr<ARGIND ? OFS : ORS)
    }
    for (key in vals) {
        printf "%s%s", key, OFS
        for (fileNr=1; fileNr<=ARGIND; fileNr++) {
            printf "%s%s", vals[key][fileNr]+0, (fileNr<ARGIND ? OFS : ORS)
        }
    }
}

$ awk -f tst.awk file1 file2 file3
        file1   file2   file3
a       3       0       99
b       4       33      0
c       6       7       0
d       0       0       4
e       0       0       9
f       0       56      44
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thanks a lot @Mortan, that's Amazing, works perfect, I would never have done with complex syntax of awk commands – Kiran Oct 03 '19 at 05:28
  • @Kran - "complex syntax of awk commands"? No. Awk is as simple as it gets - it's just a VERY stripped down version of C wrapped in an implicit `while read line; do split line into fields .... ; done` loop. If you think it's complex then you're very much missing something! – Ed Morton Oct 03 '19 at 13:43
  • Actually I am from Life Science (Biotech) professional So its bit complex for me imbibe it all :-) – Kiran Oct 03 '19 at 18:55
1

For the case that the input files are not sorted, explicit sort is needed. Using process substitution keep solution simple

#! /bin/sh

# Helper function - join with required parameters
function j {
    join -a1 -a2 -oauto -e0 "$@"
}

echo "file1" "file2" "file3"
j <(sort file1) <(sort file2) | j - <(sort file3)

Output:

file1 file2 file3
a 3 0 99
b 4 33 0
c 6 7 0
d 0 0 4
e 0 0 9
f 0 56 44

Or with column -t:

echo "key" "file1" "file2" "file3"
j <(sort file1) <(sort file2) | j - <(sort file3) | column -t
key  file1  file2  file3
a    3      0      99
b    4      33     0
c    6      7      0
d    0      0      4
e    0      0      9
f    0      56     44
dash-o
  • 13,723
  • 1
  • 10
  • 37
  • @Thor, I am not sure how to use this along with function and echo, i have more than 10 files do i have to pipe "|" them "j <(sort file1) <(sort file2) | j - <(sort file3) like all along, Thanks – Kiran Oct 03 '19 at 05:24
  • Oops, I've missed the point about large number of files. The above works for small fixed number of files. Possible to iterate using a temp file - but the 'awk' based solution above is probably where you want to go. – dash-o Oct 03 '19 at 17:38