2

Recently I've encountered a strange behavioral problem with awk

say I have two files one with blank file & the another is with populated data

so let me apply a simple unmatched code

awk -v var=0  'NR==FNR{a[$var]++;next} !($var in a)' file1 file2

say

file1

&

file 2
a
b
v

it will return blank data where as it is supposed to return all the content in file 2. can someone explain me how to overcome this issue?

Jahid
  • 21,542
  • 10
  • 90
  • 108
bongboy
  • 147
  • 1
  • 15

1 Answers1

3

There isn't any data in file1, so the overall record number never changes, so FNR == NR throughout file2. I'm not sure there's an easy way to fix that, either.

You can't even use a BEGIN block to record the current file name and spot when the file name changes. The POSIX specification for awk says:

FILENAME A pathname of the current input file. Inside a BEGIN action the value is undefined. Inside an END action the value shall be the name of the last input file processed.

I think your best bet is likely to be comparing FILENAME with ARGV[1]:

awk -v var=0 'FILENAME==ARGV[1] {a[$var]++;next} !($var in a)' file1 file2
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • but logically speaking as the array a[] doesn't have any value populated so `$var in a` supposed to print all unmatched. isn't it? – bongboy Jun 02 '15 at 15:57
  • 2
    While it is reading `file2`, it is busy in the `FNR == NR` block. It never gets to the `!(var in a)` code; you'd need a third file on the command line to get to that. – Jonathan Leffler Jun 02 '15 at 16:00
  • You're right, there is no easy fix. In GNU awk you could use `ARGIND==1` but like `FILENAME==ARGV[1]` it'll also fail when the first arg isn't a file name, e.g. `awk '...' FS="," file1 file2` but at least it will succeed when the same file name is repeated, e.g. `awk '...' file1 file1`. The robust approach in GNU awk 4.* is `awk 'BEGINFILE{fileNr++} fileNr==1{...} file1 file2`. In other awks you need to do something with BEGIN and ARGV and test for `=`, and maybe use `getline`, etc. – Ed Morton Jun 02 '15 at 19:17
  • @EdMorton: If you use `-v FS=","` before the program argument, then `awk` discounts that as an argument from `ARGV`. If you place it after the script without the `-v` option, then it does get counted in `ARGV`. Further, the variable is not set until the argument is processed, so FS is not set in a BEGIN block if it appears after the script. And you can't have `-v` options after the script, it seems (`awk '{ print }' -v FS="/" /etc/passwd` fails to open file `-v`). – Jonathan Leffler Jun 02 '15 at 19:44
  • Using `-v FS=","` (or the more common `-F","`) is a completely different situation from what I'm describing. I'm talking about the fact that setting variables in the file list (and there are legitimate reasons to do that such as using a different FS for each input file or needing escape sequences not to be expanded) makes them part of `ARGV[]` and so you can't assume that `ARGV[1]` is the first file name. – Ed Morton Jun 02 '15 at 19:47
  • @EdMorton: I think we're in agreement, actually. Comments (especially those done while trying to get to the next meeting) aren't always as complete as intended. If you use `-F ","` (or `-v FS=","`) before the script, or if you use `FS=","` after the script, the (input) field separator is set. The difference is, though, that `ARGV` includes the `FS=","` when it appears after the script, but not when you specify it before the script. I'm sorry I didn't make the agreement clear enough -- I was concentrating too much on extending your comments to cover other cases. – Jonathan Leffler Jun 02 '15 at 21:19
  • That's all perfectly correct but it's all beside the point of my original comment so I think maybe I didn't make my point clear and it's just this - you cannot rely on `FILENAME==ARGV[1]` to test for the current file name being the first file name in the argument list. – Ed Morton Jun 02 '15 at 21:29