1

I am looking for the pattern #type in a set of files. As output I should return the lines containing that pattern. The lines are organized as columns with tab separator:

<subject1> <#type> <object1>
<subject2> <#type> <object1>
<subject3> <#type> <object2>
<subject4> <#type> <object2>
<subject5> <#type> <object3>

For this purpose I am using the command ack-grep:

$ack-grep "#type"

I can also use sed as:

sed -n -e "/#type/p;q" *.nt

the problem is that the duplicates that I should avoid are the objects. As output I should be having:

 <subject1> <#type> <object1>
 <subject3> <#type> <object2>
 <subject5> <#type> <object3>

Hani Goc
  • 2,371
  • 5
  • 45
  • 89

1 Answers1

4

Why don't you simply use good old grep? It should be basically:

grep '#type' *.nt

To avoid duplicates in the objectN part you can use uniq with the --skip-fields option:

grep '#type' *.nt | sort -k3,3 | uniq --skip-fields 2

However, the output needs to get sorted before using uniq.

hek2mgl
  • 152,036
  • 28
  • 249
  • 266
  • I can use it @hek2mgl but the problem is with the duplicates. I can save the output in a text file by using grep. Then write a code in c++ to remove the duplicates but they'll laugh at me lol it's not a very good idea to do it – Hani Goc Jul 23 '15 at 12:06
  • Oh, wait I missed the part of your question where you want to replace duplicates. Let me add that – hek2mgl Jul 23 '15 at 12:06
  • really interesting thank you @hek2mgl. If it was a google interview it would have really gone bad lololol. I was thinking of writing a C++ code lololo – Hani Goc Jul 23 '15 at 12:21