-4

I have two files as below and I want to merge them into one file based on common IDs:

File1:

ARS     8.0   8.0
BBL     1.1   1.2
CCL     1.9   1.8

File2:

ARS     2.3   2.4
ARS     2.6   2.4
ARS     2.5   2.3
BBL     1.9   1.8
EDE     1.4   1.6

Desired output:

ARS     8.0   8.0    ARS     2.3   2.4
ARS     8.0   8.0    ARS     2.6   2.4   
ARS     8.0   8.0    ARS     2.5   2.3
BBL     1.1   1.2    BBL     1.9   1.8
CCL     1.9   1.8    NA
NA                   EDE     1.4   1.6
Behmah
  • 137
  • 2
  • 10
  • Are those tab-separated fields or fixed-width or some number of blank chars or something else? – Ed Morton Sep 21 '17 at 13:36
  • Possible duplicate of [Linux - join 2 CSV files](https://stackoverflow.com/questions/8820778/linux-join-2-csv-files) – JNevill Sep 21 '17 at 13:36
  • They are all tab-spaced. – Behmah Sep 21 '17 at 13:40
  • Can `ARS`, for example, appear multiple times in file1 just like it can in file2? If so include a case like that in your example. – Ed Morton Sep 21 '17 at 13:44
  • JNevil, That is different – Behmah Sep 21 '17 at 13:45
  • Ed Morton, No, duplicates are just in file 2. – Behmah Sep 21 '17 at 13:46
  • I've voted to close this question because it appears to be a request for a recommendation for a tool or solution, rather than a request for assistance with your own code. This makes your question off-topic for StackOverflow. If that assessment was incorrect, and you do indeed want help writing your own code, then please [add your work so far to your question](https://stackoverflow.com/posts/46345345/edit) and I'll happily retract my close vote. – ghoti Sep 21 '17 at 14:48

3 Answers3

2
$ cat tst.awk
BEGIN { FS=OFS="\t" }
FNR==1 {
    na = $0
    gsub("[^"FS"]","",na)
    nas[++numFiles] = "NA" na
}
NR==FNR { file1[$1] = $0; next }
$1 in file1 { print file1[$1], $0 }
{ file2[$1] = $0 }
END {
    for (key in file1) {
        if ( !(key in file2) ) {
            print file1[key], nas[2]
        }
    }
    for (key in file2) {
        if ( !(key in file1) ) {
            print nas[1], file2[key]
        }
    }
}

.

$ awk -f tst.awk file1 file2
ARS     8.0     8.0     ARS     2.3     2.4
ARS     8.0     8.0     ARS     2.6     2.4
ARS     8.0     8.0     ARS     2.5     2.3
BBL     1.1     1.2     BBL     1.9     1.8
CCL     1.9     1.8     NA
NA                      EDE     1.4     1.6

There's various ways it could be optimized to not store the whole contents of both files in arrays but I like the simplicity and symmetry of the above and it's trivial to optimize later if that proves to be necessary due to massive input files.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

Could you please try following and let me know if this helps you.(I have kept the order of NA at last only let me know if this helps you). Adding a non-one liner form of solution too here.

awk '
FNR==NR{
  a[$1]=$0;
  next
}
($1 in a){
  print a[$1],$0;
  b[$1];
  next
}
{
  print "NA\t",$0       
}
END{
  for(i in b){
    delete a[i]
};
  for(j in a){
  print a[j],"\tNA"
}
}
' file1  file2

Output will be as follows.

ARS     8.0   8.0 ARS     2.3   2.4
ARS     8.0   8.0 ARS     2.6   2.4
ARS     8.0   8.0 ARS     2.5   2.3
BBL     1.1   1.2 BBL     1.9   1.8
NA   EDE     1.4   1.6
CCL     1.9   1.8   NA
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
1

There is a tool for merging files

join -a 1 -a 2 -e NA t24.in1 t24.in2  -o 1.1,1.2,1.3,2.1,2.2,2.3

output:

ARS     8.0     8.0     ARS     2.3     2.4
ARS     8.0     8.0     ARS     2.6     2.4
ARS     8.0     8.0     ARS     2.5     2.3
BBL     1.1     1.2     BBL     1.9     1.8
CCL     1.9     1.8     NA      NA      NA
NA      NA      NA      EDE     1.4     1.6
ULick
  • 969
  • 6
  • 12