-1

How can "squeeze-repeated" words? similar to "squeeze repeated characters" with tr -s ''

I would like to change for example:

hello.hello.hello.hello

to

hello
user2783132
  • 225
  • 1
  • 3
  • 16
  • 1
    Can you show us an example? – scai Oct 01 '13 at 14:02
  • 1
    How do you justify `hellohellohellohello` ==> `hello`? So `hello` should be `helo`? – devnull Oct 01 '13 at 14:08
  • 2
    First, you'll have to define what a word is. By most normal definitions, `hellohellohellohello` is a single word (that humans recognize as containing the same sub-word, `hello`, four times). If you've got to look for arbitrarily long repeats within a single contiguous block of non-space characters, you've got quite a problem on your hands — I'm not aware of any standard tools that will address the job. What will be the output for the input `banana hello hello abracadabra`? Is it `bana helo abracad`? If not, what, and why? – Jonathan Leffler Oct 01 '13 at 14:13

1 Answers1

1

This can be a way:

$ cat a
hello hello bye but bye yeah
hello yeah
$ awk 'BEGIN{OFS=FS=" "} 
  {  for (i=1; i<=NF; i++) {
       if (!($i in a)) {printf "%s%s",$i,OFS; a[$i]=$i}
     }; 
    delete a;
    print ""
  }' a
hello bye but yeah 
hello yeah 

You can change the field separator:

$ cat a
hello|hello|bye|but|bye|yeah
hello|yeah
$ awk 'BEGIN{OFS=FS="|"} {for (i=1; i<=NF; i++) {if (!($i in a)) {printf "%s%s",$i,OFS; a[$i]=$i}}; delete a; print ""}' a
hello|bye|but|yeah|
hello|yeah|
fedorqui
  • 275,237
  • 103
  • 548
  • 598