1

i am running nawk scripts on solaris system to get records of file1 not in file2 and find duplicate records in a while with the following scripts -

compare:

nawk 'NR==FNR{a[$0]++;next;} !a[$0] {print"line":" FNR $0}' file1 file2

duplicate:

nawk '{a[$0]++}END{for(i in a){if(a[i]-1)print i,a[i]}}' file1

in the middle of script i get an error message saying

nawk: out of space in tostring on record 971360

I am using a file having 2 million records.

fedorqui
  • 275,237
  • 103
  • 548
  • 598
  • 1
    what is your question? Please don't make us guess ;-) If script 1 is working, then use it. Good luck. – shellter Feb 17 '14 at 15:50
  • 1
    Can the files be sorted? If so then using `comm` for the compare and `uniq` for identifying duplicates would be the normal approach. Post some sample input and expected output if you'd like help. – Ed Morton Feb 17 '14 at 16:00

1 Answers1

1

Correct your code, your double quote is mismatched also..

 nawk 'NR==FNR{a[$0];next;} !($0 in a){print "line:" FNR $0}' file1 file2

--edit--

for duplicate try this

nawk '{A[$0]++}END{for(i in A)if(A[i]>1)print i,A[i]}' file

!a[0] --> using a[$0] creates an extra empty array element for every $0 that does not exist in array a while reading the second file, so best thing is to do !($0 in a)

Akshay Hegde
  • 16,536
  • 2
  • 22
  • 36