How to merge two files with different fields based on one matching columns?

Question

I have two files as below and I want to merge them into one file based on common IDs:

File1:

ARS     8.0   8.0
BBL     1.1   1.2
CCL     1.9   1.8

File2:

ARS     2.3   2.4
ARS     2.6   2.4
ARS     2.5   2.3
BBL     1.9   1.8
EDE     1.4   1.6

Desired output:

ARS     8.0   8.0    ARS     2.3   2.4
ARS     8.0   8.0    ARS     2.6   2.4   
ARS     8.0   8.0    ARS     2.5   2.3
BBL     1.1   1.2    BBL     1.9   1.8
CCL     1.9   1.8    NA
NA                   EDE     1.4   1.6

Are those tab-separated fields or fixed-width or some number of blank chars or something else? — Ed Morton, Sep 21 '17 at 13:36
Possible duplicate of [Linux - join 2 CSV files](https://stackoverflow.com/questions/8820778/linux-join-2-csv-files) — JNevill, Sep 21 '17 at 13:36
Can `ARS`, for example, appear multiple times in file1 just like it can in file2? If so include a case like that in your example. — Ed Morton, Sep 21 '17 at 13:44
I've voted to close this question because it appears to be a request for a recommendation for a tool or solution, rather than a request for assistance with your own code. This makes your question off-topic for StackOverflow. If that assessment was incorrect, and you do indeed want help writing your own code, then please [add your work so far to your question](https://stackoverflow.com/posts/46345345/edit) and I'll happily retract my close vote. — ghoti, Sep 21 '17 at 14:48

Ed Morton · Answer 1 · 2017-09-21T15:44:25.970

$ cat tst.awk
BEGIN { FS=OFS="\t" }
FNR==1 {
    na = $0
    gsub("[^"FS"]","",na)
    nas[++numFiles] = "NA" na
}
NR==FNR { file1[$1] = $0; next }
$1 in file1 { print file1[$1], $0 }
{ file2[$1] = $0 }
END {
    for (key in file1) {
        if ( !(key in file2) ) {
            print file1[key], nas[2]
        }
    }
    for (key in file2) {
        if ( !(key in file1) ) {
            print nas[1], file2[key]
        }
    }
}

.

$ awk -f tst.awk file1 file2
ARS     8.0     8.0     ARS     2.3     2.4
ARS     8.0     8.0     ARS     2.6     2.4
ARS     8.0     8.0     ARS     2.5     2.3
BBL     1.1     1.2     BBL     1.9     1.8
CCL     1.9     1.8     NA
NA                      EDE     1.4     1.6

There's various ways it could be optimized to not store the whole contents of both files in arrays but I like the simplicity and symmetry of the above and it's trivial to optimize later if that proves to be necessary due to massive input files.

`^1` liked the way you used `gsub` – Akshay Hegde Sep 21 '17 at 15:37 — Akshay Hegde, Sep 21 '17 at 15:37

RavinderSingh13 · Accepted Answer · 2017-09-21T17:13:49.630

1

Could you please try following and let me know if this helps you.(I have kept the order of NA at last only let me know if this helps you). Adding a non-one liner form of solution too here.

awk '
FNR==NR{
  a[$1]=$0;
  next
}
($1 in a){
  print a[$1],$0;
  b[$1];
  next
}
{
  print "NA\t",$0       
}
END{
  for(i in b){
    delete a[i]
};
  for(j in a){
  print a[j],"\tNA"
}
}
' file1  file2

Output will be as follows.

ARS     8.0   8.0 ARS     2.3   2.4
ARS     8.0   8.0 ARS     2.6   2.4
ARS     8.0   8.0 ARS     2.5   2.3
BBL     1.1   1.2 BBL     1.9   1.8
NA   EDE     1.4   1.6
CCL     1.9   1.8   NA

edited Sep 21 '17 at 17:13

answered Sep 21 '17 at 14:01

RavinderSingh13

130,504
14
57
93

Thanks RavinderSingh The second one worked BUT I want NA in the right columns to recognise which file it belongs to so say if ID only comes from file 1, NA should appear on the right side in output. (have a look at my question). Here EDE should appear on the right and NA on the left and the opposite for CCL. – Behmah Sep 21 '17 at 14:30
@Behmah, please check my edited solution now and let me know. – RavinderSingh13 Sep 21 '17 at 14:38
It didn't change :( – Behmah Sep 21 '17 at 14:46
When I am running my code I am getting same output as I mentioned in my post, let me know if you are running exactly same code? – RavinderSingh13 Sep 21 '17 at 14:48
You are sure you make the edition in the code above? – Behmah Sep 21 '17 at 14:57
yes, 110% I re-ran it again and same shown output which I mentioned in my post. – RavinderSingh13 Sep 21 '17 at 15:03
sorry Ravinder it works for this example but not for my actual file maybe because my actual file is tab-space ?? – Behmah Sep 21 '17 at 15:07
ahhh apologies for it, try adding -F'\t" after awk and let me know then, edited code now too. – RavinderSingh13 Sep 21 '17 at 15:09
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/155012/discussion-between-behmah-and-ravindersingh13). – Behmah Sep 21 '17 at 15:17
Thank you Ed, but in above one of the comments OP mentioned in CCL it should be NA in left most side so I changed my code as per the same. – RavinderSingh13 Sep 21 '17 at 16:46
1

I apologies Ed, I really overlooked it, I edited my code now with output(specially NA's place) – RavinderSingh13 Sep 21 '17 at 17:14

ULick · Answer 3 · 2017-09-22T16:42:51.983

1

There is a tool for merging files

join -a 1 -a 2 -e NA t24.in1 t24.in2  -o 1.1,1.2,1.3,2.1,2.2,2.3

output:

ARS     8.0     8.0     ARS     2.3     2.4
ARS     8.0     8.0     ARS     2.6     2.4
ARS     8.0     8.0     ARS     2.5     2.3
BBL     1.1     1.2     BBL     1.9     1.8
CCL     1.9     1.8     NA      NA      NA
NA      NA      NA      EDE     1.4     1.6

edited Sep 22 '17 at 16:42

answered Sep 22 '17 at 16:31

ULick

969
6
12

How to merge two files with different fields based on one matching columns?

3 Answers3

Linked