Linux awk merge two files

Question

I have below script to combine two files.

awk -F"\t" '
    {key = $1}
    !(key in result) {result[key] = $0; next;}
    { for (i=2; i <= NF; i++) result[key] = result[key] FS $i }
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"    # if using GNU awk
        for (key in result) print result[key]
    }
' $1 $2 > $3

First column is key and both $1 and $2. But if $2 column has a key but $1 column does not have key.

then also it's combining except $1 row.

I want to combine only incase $1 key is existed. How can I simply combine this two files?

For example,

File 1

Key    Column1  Column2  Column3  
Test1    500     400     200               
Test2    499     400     200               
Test5    600     200     150               
Test6    600     199     150               
Test7    599     199     100

File2

Key    Column4   Column5
Test1    Good     Good                    
Test2    Good     Good
Test3    Good     Good                    
Test4    Good     Good
Test5    Good     Good                    
Test6    Good     Good
Test7    Good     Good

Current Combine

Key    Column1  Column2  Column3  Column4   Column5
Test1    500     400     200       Good     Good     
Test2    499     400     200       Good     Good      
Test5    600     200     150       Good     Good          
Test6    600     199     150       Good     Good          
Test7    599     199     100       Good     Good  
Test3    Good    Good  
Test4    Good    Good

Expected Combine.

Key    Column1  Column2  Column3  Column4   Column5
Test1    500     400     200       Good     Good     
Test2    499     400     200       Good     Good      
Test5    600     200     150       Good     Good          
Test6    600     199     150       Good     Good          
Test7    599     199     100       Good     Good

Thank you!

The sentences `First column is key and both $1 and $2. But if $2 column has a key but $1 column does not have key. then also it's combining except $1 row.` do not mean anything. You should either change them to reflect what you mean or just delete them. — Ed Morton, Jun 05 '15 at 23:00

score 7 · Accepted Answer · answered Jun 05 '15 at 22:55

You're going about this wrong. What you are describing is a join operation and there is a perfectly good UNIX tool for that with a very obvious name:

$ join file1 file2 | column -t
Key    Column1  Column2  Column3  Column4  Column5
Test1  500      400      200      Good     Good
Test2  499      400      200      Good     Good
Test5  600      200      150      Good     Good
Test6  600      199      150      Good     Good
Test7  599      199      100      Good     Good

or if you insist on awk:

$ awk 'NR==FNR{m[$1]=$2" "$3; next} {print $0, m[$1]}' file2 file1 | column -t
Key    Column1  Column2  Column3  Column4  Column5
Test1  500      400      200      Good     Good
Test2  499      400      200      Good     Good
Test5  600      200      150      Good     Good
Test6  600      199      150      Good     Good
Test7  599      199      100      Good     Good

bkmoney · Answer 2 · 2015-06-06T01:08:30.073

Add a condition when storing into the array

{key = $1}
!(key in result) && NR == FNR {result[key] = $0; next;}
(key in result) { for (i=2; i <= NF; i++) {
    result[key] = result[key] FS $i
    }
}
END {
    PROCINFO["sorted_in"] = "@ind_str_asc"    # if using GNU awk
    for (key in result) print result[key]
}

The NR == FNR makes sure that the key we store into result is from the 1st file. We also add (key in result) to make sure the key exists before we iterate through the for loop.

score 4 · Answer 3 · answered Jun 05 '15 at 22:16

You can try following command:

awk '
    BEGIN { FS = OFS = "\t" }
    {key = $1}
    FNR == NR {result[key] = $0; next;}
    (key in result) { for (i=2; i <= NF; i++) result[key] = result[key] FS $i }
    END {
        PROCINFO["sorted_in"] = "@ind_str_asc"    # if using GNU awk
        for (key in result) print result[key]
    }
' file1 file2

I've changed those checks. The FNR == NR only saves in result lines from first file. And (key in result) applies for second file, and only appends the columns for those keys found previously in the first file.

It yields:

Key     Column1 Column2 Column3         Column4 Column5
Test1   500     400     200             Good    Good
Test2   499     400     200             Good    Good
Test5   600     200     150             Good    Good
Test6   600     199     150             Good    Good
Test7   599     199     100             Good    Good

Linux awk merge two files

3 Answers3