1

I have 2 files,

head file1

ESC_KB2908AA_AS  1  Eco-19-00825
ESC_KB2909AA_AS  2  Eco-20-00081
ESC_KB2910AA_AS  3  Eco-20-00128
ESC_KB2911AA_AS  4  Eco-19-00991

and head file2

ESC_KB2908AA_AS
ESC_KB2908AA_AS
ESC_KB2908AA_AS

ESC_KB2909AA_AS
ESC_KB2909AA_AS
ESC_KB2909AA_AS

ESC_KB2910AA_AS
ESC_KB2910AA_AS

ESC_KB2911AA_AS
ESC_KB2911AA_AS

I want to check if the first column of file1 matches to the first column of file2 and if ID matched then print the output according to file2.

Expected file output :

ESC_KB2908AA_AS   1  Eco-19-00825
ESC_KB2908AA_AS   1  Eco-19-00825
ESC_KB2908AA_AS   1  Eco-19-00825

ESC_KB2909AA_AS  2  Eco-20-00081
ESC_KB2909AA_AS  2  Eco-20-00081
ESC_KB2909AA_AS  2  Eco-20-00081

ESC_KB2910AA_AS  3  Eco-20-00128
ESC_KB2910AA_AS  3  Eco-20-00128

ESC_KB2911AA_AS  4  Eco-19-00991
ESC_KB2911AA_AS  4  Eco-19-00991

I tried the following code but I am getting output only for last line of file2. Could you please help what's wrong with script?

awk 'FNR==NR{key[$1]=$2;next} {$2=key[$1]}1' file1.txt file2.txt > output.txt
  • If the files are sorted, you can probably use `join` – Shawn Jan 28 '22 at 11:35
  • 1
    Run `dos2unix` or similar on your input files as what you say is happening ("getting output only for last line of file2") is impossible given regular text files as input, see [why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it](https://stackoverflow.com/questions/45772525/why-does-my-tool-output-overwrite-itself-and-how-do-i-fix-it), but maybe you have a bunch of `\r`s hiding the output. That `1` at the end of your script means "unconditionally print the current line" so it **WILL** print every line from file2. – Ed Morton Jan 28 '22 at 14:00

1 Answers1

1

This awk should work for you:

awk 'FNR==NR{key[$1] = $0; next} !NF || $1 in key {print key[$1]}' file1 file2
ESC_KB2908AA_AS  1  Eco-19-00825
ESC_KB2908AA_AS  1  Eco-19-00825
ESC_KB2908AA_AS  1  Eco-19-00825

ESC_KB2909AA_AS  2  Eco-20-00081
ESC_KB2909AA_AS  2  Eco-20-00081
ESC_KB2909AA_AS  2  Eco-20-00081

ESC_KB2910AA_AS  3  Eco-20-00128
ESC_KB2910AA_AS  3  Eco-20-00128

ESC_KB2911AA_AS  4  Eco-19-00991
ESC_KB2911AA_AS  4  Eco-19-00991

Condition !NF || $1 in key will print if it is a blank line or else of $1 exists in key array which store full line from file1 with key as $1 from the same file.

anubhava
  • 761,203
  • 64
  • 569
  • 643