0

I am looking for the best known algorithm for removing duplicates from a string. I can think of numerous ways of doing this, but I am looking for a solution that is known for being particularly efficient.

Let's say you have the following strings:

  • Lorem Ipsum Lorem Ipsum
  • Lorem Lorem Lorem
  • Lorem Ipsum Dolor Lorem Ipsum Dolor Lorem Ipsum Dolor

I would expect this algorithm to output for each (respectively):

  • Lorem Ipsum
  • Lorem
  • Lorem Ipsum Dolor

Note, I am doing this in PHP, in case anybody is aware of any built in PHP functions that can help with this.

Thanks!

chaimp
  • 16,897
  • 16
  • 53
  • 86

5 Answers5

6
$arr = explode( " " , $string );
$arr = array_unique( $arr );
$string = implode(" " , $arr);
AbiusX
  • 2,379
  • 20
  • 26
2

Dunno about efficiency, but maybe this can do:

$str = implode(" ", array_unique(explode(" ", $str)));
Mārtiņš Briedis
  • 17,396
  • 5
  • 54
  • 76
2
$words = array_unique(explode(' ',$text));
echo implode(' ',$words);

if you want to make it better you can use preg_split with \s\W for exploding words

dynamic
  • 46,985
  • 55
  • 154
  • 231
1

Best way of doing it:

  1. Sort the words inside string
  2. Remove duplicates by iterating the sorted words

Other possibility is using a set construction if your language supports it.

Pablo Santa Cruz
  • 176,835
  • 32
  • 241
  • 292
  • 1
    This is a good answer, but requires the extra step of putting the string back into it's original order. – chaimp Mar 16 '11 at 20:36
0

You can try below code for removing duplicate code from any sentence

$arr = explode(" " , $string);
$arr = preg_replace('/(\w{2,})(?=.*?\\1)\W*/', '', $arr);
$string = implode(" " , $arr);
PCMShaper
  • 54
  • 5