-1

I have a file with records in such a type-

,laac_repo,cntrylist,idlist,domlist,typelist
1,22DE17,BA,S6CD6728,24JA13,6A
2,12FE18,AA,S6FD7688,25DA15,7D
3,22DE17,BA,S6CD6728,24JA13,6A
4,12FE18,AA,S6FD7688,25DA15,7D

I want to remove duplicate records considering 4th column which has "S6CD6728" these type of record and skipping first row which is

",laac_repo,cntrylist,idlist,domlist,type list"

I have tried

awk '{a[$4]++}!(a[$4]-1)' filename

And also tried

awk 'FNR > 1 {a[$4]++}!(a[$4]-1)' filename

The expected output is-

,laac_repo,cntrylist,idlist,domlist,typelist
1,22DE17,BA,S6CD6728,24JA13,6A
2,12FE18,AA,S6FD7688,25DA15,7D

P.S file has more than 10 million records, please suggest solution w.r.t that.( If any script given much appreciated, instead of single command).

Darkman
  • 2,941
  • 2
  • 9
  • 14
Saurabh
  • 1
  • 3

1 Answers1

1

What about this:

awk -F, 'FNR>1 && \!seen[$4]++' filename
1,22DE17,BA,S6CD6728,24JA13,6A
2,12FE18,AA,S6FD7688,25DA15,7D
awk -F, '\!seen[$4]++' filename
,laac_repo,cntrylist,idlist,domlist,typelist
1,22DE17,BA,S6CD6728,24JA13,6A
2,12FE18,AA,S6FD7688,25DA15,7D
Darkman
  • 2,941
  • 2
  • 9
  • 14