-2

I have a file with an output like this:

server1
server2
server1_server2
server3
server4
server5
server6
server5_server6
server7
server8
server_prod
server_test
.....

Im searching the way to remove the lines that already are included in lines with _. The output should be:

server1_server2
server3
server4
server5_server6
server7
server8
server_prod
server_test

**Note that the last two server has "_" but are uniques...so i want to keep them.

glenn jackman
  • 238,783
  • 38
  • 220
  • 352
Deeshey
  • 1
  • 2
  • Are all pairs sure to not contain any duplicates? For instance, what would you want from "server1, server2_server3, server2_server4"? – Paul Floyd Sep 20 '17 at 14:19
  • Yes, the pairs doesnt contains duplicates. They are just concatenated names. To do it more complex i have server names with "_" like "server_prod01". But i dont want to delete it because is not duplicated. – Deeshey Sep 20 '17 at 15:08
  • This looks a lot like a homework question. What have you tried so far? You also need to explain exactly what you want to achieve. It is unclear right now what algorithm you are imagining that transforms that list into the second list. Don't make us guess; tell us. –  Sep 20 '17 at 15:20

1 Answers1

1

this awk one-liner may help you:

awk -F'_' 'NR==FNR{if(NF>1)for(i=1;i<=NF;i++)a[$i]=1;next} !a[$0]' file file
server1_server2
server3
server4
server5_server6
server7
server8
Kent
  • 189,393
  • 32
  • 233
  • 301
  • Hi, i think that works! but i couldn't understand the logic of the code. 1- You set the delimeter to "_" 2- count IF the line has more than 1 field 3- ?? I has unique server names like "server_prod3" and the code didnt delete it ( i dont want to delete that server because is unique) – Deeshey Sep 20 '17 at 14:51
  • I don't understand what did you mean. if you made an example, pls cover all the cases. @Deeshey – Kent Sep 20 '17 at 15:14
  • your scripts works OK. But what happen if i have UNIQUE server names with "_" ?. I want to keep them. Another question..what does the "!a[$0]" ? – Deeshey Sep 20 '17 at 15:26
  • @Deeshey my codes don't distinguish the `_` is part of servername or a separator of servername**s**. `!a[$0]` means print the line if `a[$0]==0` – Kent Sep 20 '17 at 15:28
  • NR=FNR is a commonly used awk idiom to tell apart the first file from the second or later files (in this case there are twice the same). So associative array a is filled from the first file, containing the fields separated by underscores. The second file matches the !a[$0], meaning print the line if it isn't in the associative array built up first time round. – Paul Floyd Sep 20 '17 at 15:41