2

I have two input file

FILE 1

123
125
123
129

and file 2

"a"|"123"|"anc"
"b"|"124"|"ind"
"c"|"123"|"su"
"d"|"122"|"aus"

OUTPUT:

"b"|"124"|"ind"
"d"|"122"|"aus"

now how can i compare and print the difference of $1 from file1 and $2 from file2. i'm having trouble cause of the double quote(").

So how can I compare the difference ignoring the double quote?

bongboy
  • 147
  • 1
  • 15

1 Answers1

1
 $ awk 'FNR==NR{a[$1]=1;next} a[$3]==0' file1 FS='["|]+' file2
"b"|"124"|"ind"
"d"|"122"|"aus"

How it works:

  • file1 FS='["|]+' file2

    This list of files tells awk to read file1 first, then change the field separator to any combination of double-quotes and vertical bars and then read file2.

  • FNR==NR{a[$1]=1;next}

    FNR is the number of lines that awk has read from the current file and NR is the total number of lines read. Consequently, FNR==NR is true only while reading the first file. The commands which follow in braces are only executed for the first file.

    This creates an associative array a whose keys are the first fields of file1 and whose values are 1. The next command tells awk to skip the rest of the commands and start over on the next line.

  • a[$3]==0

    This is true only if the number in field 3 did not occur in file1. If it is true, then the default action is taken which is to print the line. (With the field separator that we have chosen, the number you are interested in is in field 3.)

Alternative

$ awk 'FNR==NR{a[$1]=1;next} a[substr($2,2,length($2)-2)]==0' file1 FS='|' file2
"b"|"124"|"ind"
"d"|"122"|"aus"

This is similar to the above except that the field separator is just a vertical bar. In this case, the number that you are interested in is in field 2. We use substr to remove one character from either end of field 2 which has the effect of removing the double-quotes.

John1024
  • 109,961
  • 14
  • 137
  • 171
  • can u please explain how it works, i meant why do you need $3?? – bongboy May 04 '15 at 06:42
  • @bongboy Because he use `FS='["|]+'` the second field becomes the third for `awk`. Eks `$3` gives `124` `122` etc – Jotne May 04 '15 at 06:50
  • @bongboy I just added some explanation to the answer. With `["|]+` as the field separator, then the line `"b"|"124"|"ind"` has four fields, the first is empty, the second is `b`, the third is `124`, and the fourth is `ind`. – John1024 May 04 '15 at 06:51