Your own attempt basically has two bugs as seen in @John1024's answer:
- You use field 2 as both key and value in
a
, where you should be storing field 3 as the value (since you want to keep it for later), i.e., it should be a[$2] = $3
.
- The test
a[$6]
is false when the value in a
is zero, even if it exists. The correct test is $6 in a
.
Hence:
awk 'NR==FNR { a[$2]=$3; next } $6 in a {print $0, a[$6] }' file2 file1
However, there might be better approaches, but it is not clear from your specifications. For instance, you say that file2
has over 4 million lines, but it is unknown if there are also that many unique values for field 2. If yes, then a
will also have that many entries in memory. And, you don't specify how long file1
is, or if its order must be preserved for output, or if every line (even without matches in file2
) should be output.
If it is the case that file1
has many fewer lines than file2
has unique values for field 2, and only matching lines need to be output, and order does not need to be preserved, then you might wish to read file1
first…