How to use awk to test if a column value is in another file?

Question

I want to do something like

if ($2 in another file) { print $0 }

So say I have file A.txt which contains

aa
bb
cc

I have B.txt like

00,aa
11,bb
00,dd

I want to print

00,aa
11,bb

How do I test that in awk? I am not familiar with the tricks of processing two files at a time.

`grep -Ff patternFile searchFile` OR `fgrep -fpatternFile searchFile` will do that. Good luck. — shellter, Mar 01 '16 at 21:56

score 7 · Accepted Answer · answered Mar 01 '16 at 21:57

You could use something like this:

awk -F, 'NR == FNR { a[$0]; next } $2 in a' A.txt B.txt

This saves each line from A.txt as a key in the array a and then prints any lines from B.txt whose second field is in the array.

NR == FNR is the standard way to target the first file passed to awk, as NR (the total record number) is only equal to FNR (the record number for the current file) for the first file. next skips to the next record so the $2 in a part is never reached until the second file.

score 2 · Answer 2 · answered Mar 01 '16 at 22:15

alternative with join

if the files are both sorted on the joined field

$ join -t, -1 1 -2 2 -o2.1,2.2 file1 file2

00,aa
11,bb

set delimiter to comma, join first field from first file with second field from second file, output fields swapped. If not sorted you need to sort them first, but then awk might be a better choice.

score 1 · Answer 3 · answered Mar 01 '16 at 22:01

1

There seem to be two schools of thought on the matter. Some prefer to use the BEGIN-based idiom, and others the FNR-based idiom.

Here's the essence of the former:

awk -v infile=INFILE '
  BEGIN { while( (getline < infile)>0 ) { .... } }
  ... '

For the latter, just search for:

awk 'FNR==NR'

answered Mar 01 '16 at 22:01

peak

105,803
17
152
177

It's not outrageous to use BEGIN for this (see http://awk.info/?tip/getline) but it's hard to understand why you'd want to write `BEGIN { while( (getline line < "file")>0 ) a[line]; close(file) }` vs `NR==FNR{a[$0];next}`. The only reason I can think of to use the former might be if you were worried about `file` being empty but then I'd still just use `FILENAME==ARGV[1]` or similar instead of `NR==FNR`. – Ed Morton Mar 01 '16 at 22:51
The BEGIN-based idiom has the advantage of clearly expressing intention, and does not have the overhead of NR==FNR, no? – peak Mar 02 '16 at 00:34
IMHO it more obfuscates the code for those of us used to the `NR==FNR` approach (gets us wondering what's going on to make that necessary) but I can understand the opposite point of view as well. It does remove a test per line in the 2nd file but I suspect the manually written getline loop is slower than the builtin loop so I'm really not sure if there'd be an overall performance difference either way - a test could be done of course, if anyone cared ... :-). – Ed Morton Mar 02 '16 at 00:50

score 0 · Answer 4 · answered Oct 12 '18 at 08:34

This can be done by reading the first file and the storing the required column in an array. Remember awk stores arrays in key -> value pair.

#!/bin/sh

INPUTFILE="source.txt"
DATAFILE="file1.txt"

awk 'BEGIN {
while (getline < "'"$INPUTFILE"'")
    {
    split($1,ar,",");
    for (i in ar) dict[ar[i]]=""

    }
close("'"$INPUTFILE"'");

while (getline < "'"$DATAFILE"'")
    {
    if ($3 in dict) {print $0}
    }
}'

source.txt --

121 sekar osanan

321 djfsdn jiosj

423 sjvokvo sjnsvn

file1.txt --

sekar osanan 424

djfsdn jiosj 121

sjvokvo sjnsvn 321

snjsn vog 574

nvdfi aoff 934

sadaf jsdac 234

kcf dejwef 274

Output --

djfsdn jiosj 121

sjvokvo sjnsvn 321

It just forms an array with the first coumn of source.txt and checks 3rd element of everyline in file1.txt with the array to see its availability. Like wise any column/operation can be performed with multiple files.

score -1 · Answer 5 · answered Mar 02 '16 at 05:37

-1

Another way to do

   awk -F, -v file_name=a.txt '{if(system("grep -q " $2 OFS file_name) == 0){print $0}}' b.txt

answered Mar 02 '16 at 05:37

Varun

447
4
9

How to use awk to test if a column value is in another file?

5 Answers5

Linked