Filter duplicates in file by using columns as parameters (grep linux)

Question

I am looking for the pattern #type in a set of files. As output I should return the lines containing that pattern. The lines are organized as columns with tab separator:

<subject1> <#type> <object1>
<subject2> <#type> <object1>
<subject3> <#type> <object2>
<subject4> <#type> <object2>
<subject5> <#type> <object3>

For this purpose I am using the command ack-grep:

$ack-grep "#type"

I can also use sed as:

sed -n -e "/#type/p;q" *.nt

the problem is that the duplicates that I should avoid are the objects. As output I should be having:

 <subject1> <#type> <object1>
 <subject3> <#type> <object2>
 <subject5> <#type> <object3>

hek2mgl · Accepted Answer · 2015-07-23T12:22:30.467

4

Why don't you simply use good old grep? It should be basically:

grep '#type' *.nt

To avoid duplicates in the objectN part you can use uniq with the --skip-fields option:

grep '#type' *.nt | sort -k3,3 | uniq --skip-fields 2

However, the output needs to get sorted before using uniq.

edited Jul 23 '15 at 12:22

answered Jul 23 '15 at 12:05

hek2mgl

152,036
28
249
266

I can use it @hek2mgl but the problem is with the duplicates. I can save the output in a text file by using grep. Then write a code in c++ to remove the duplicates but they'll laugh at me lol it's not a very good idea to do it – Hani Goc Jul 23 '15 at 12:06
Oh, wait I missed the part of your question where you want to replace duplicates. Let me add that – hek2mgl Jul 23 '15 at 12:06
really interesting thank you @hek2mgl. If it was a google interview it would have really gone bad lololol. I was thinking of writing a C++ code lololo – Hani Goc Jul 23 '15 at 12:21

Filter duplicates in file by using columns as parameters (grep linux)

1 Answers1