1

While looking for a solution to identify duplicates in an array I stumpled upon many kinds of solutions counting on array_count_values or array_unique. But all of these solutions doesn't care about objects in the array.

array_count_values throws an E_WARNING for every value which isn't a string or an integer.

array_unique does take care about elements with various types if the option SORT_REGULAR has been set. But take a look at the use case as follows.

class Foo
{
    private $value;

    public function __construct( $value )
    {
        $this->value = $value;
    }
}

$f1 = new Foo( 42 );
$f2 = $f1;
$f3 = new Foo( 42 );
$f4 = new Foo( '42' );
$f5 = new Foo( 'Bar' );
$a  = [ $f1, $f2, $f3, $f4, $f5 ];

After a unification with array_unqiue I expected to get an array with 4 elements [ $f1, $f3, $f4, $f5 ]. But it states out, that array_unqiue is working loose-typed and I got [ $f1, $f5 ] which isn't the result I need.

In my case I wrote a collection working like a set. I can pass some initial elements. These elements should be validated. If one element is a duplicate an exception have to be thrown. In order of the loose-typed array_unqiue I came up with this solution (which can be adapted very easy to unify an array).

$boundN = count( $elements );
$boundM = $boundN - 1;
for ( $m = 0; $m < $boundM; $m++ )
{
    for ( $n = $m + 1; $n < $boundN; $n++ )
    {
        if ( $elements[ $m ] === $elements[ $n ] )
        {
            throw new DuplicateElementException( 'The initial values contain duplicates.' );
        }
    }
}

At least I minified the iterations in the inner loop. One can assume, that all passed elements in the outer loop are validated and don't have to be validated again.

My question is: Is there a shorter algorithm equal to algorithms like Quick Search or something?

codekandis
  • 712
  • 1
  • 11
  • 22

2 Answers2

2

In your example, it's the specific instance of each object which is unique. The spl_object_id method can get a unique identifier for each object, and you can use those as keys in an associative array to collapse for duplicates. There are a few shorthand ways to write it, but a self-contained example might be:

<?php
class Foo {
    private $data;

    public function __construct($data) {
        $this -> data = $data;
    }
}

$f1 = new Foo( 42 );
$f2 = $f1;
$f3 = new Foo( 42 );
$f4 = new Foo( '42' );
$f5 = new Foo( 'Bar' );
$a  = [ $f1, $f2, $f3, $f4, $f5 ];
$b = obj_unique($a);

print_r($b);

function obj_unique(array $not_unique) {
    $tmp = [];
    foreach($not_unique as $value) {
      $tmp[spl_object_id($value)] = $value;
    }
    return array_values($tmp);
}

This creates the following output, which is missing the duplicate values.

Array
(
    [0] => Foo Object
        (
            [data:Foo:private] => 42
        )

    [1] => Foo Object
        (
            [data:Foo:private] => 42
        )

    [2] => Foo Object
        (
            [data:Foo:private] => 42
        )

    [3] => Foo Object
        (
            [data:Foo:private] => Bar
        )

)

This idea could be trivially modified to throw an exception if the array already contains the key.

if(contains_duplicates($a)) {
    throw new Exception("Duplicates are bad etc etc ...");
}

function contains_duplicates(array $test) {
    $tmp = [];
    foreach($test as $value) {
      $key = spl_object_id($value);
      if(array_key_exists($key, $tmp)) {
          // duplicates
          return true;
      }
      $tmp[$key] = $value;
    }
    // no duplicates
    return false;
}

The === operator on an Object has the same behaviour as this. It is an instance-wise comparison, not a comparison of the contents of the object, which is something you should be aware of.

mike42
  • 1,608
  • 14
  • 24
  • 1
    This is something I call dope. I didn't realize `spl_object_id()` - a fact that made it impossible to hashmap objects. BUT: I didn't mention that there can be scalar types in the array. But it doesn't matter while I can modify your solution as well. I need two temporary arrays, one for scalar values, one for object IDs. And yes, I'm aware of using strict comparison while I wanted an instance-wise comparison. – codekandis May 30 '18 at 10:24
0

This looks like the XY problem.

Since your code is looking for duplicate instances (===) rather than just objects containing the same data, these objects must be instantiated at run time. Since you are using a numerically indexed array it suggests you are not concerned with preserving information in the array index. Hence the most appropriate solution would be to apply a method of array indexing that ensures uniqueness as you add entries to the array:

 $f1 = new Foo( 42 );
 $f2 = $f1;
 $f3 = new Foo( 42 );
 $f4 = new Foo( '42' );
 $f5 = new Foo( 'Bar' );
 $a  = [ 
   spl_object_hash($f1)=>$f1, 
   spl_object_hash($f2)=>$f2, 
   spl_object_hash($f3)=>$f3, 
   spl_object_hash($f4)=>$f4, 
   spl_object_hash($f5)=>$f5 
   ];
symcbean
  • 47,736
  • 6
  • 59
  • 94
  • Please take a look at the comment below @mike42's solution. The array can consist of several elements of several types. In my real world problem I'm taking care of the indices, while I implemented the iterator interface to iterate over my collection. This forces me to numeric index my array in the order of [0..n]. – codekandis May 30 '18 at 11:45
  • ...and as you said in your comment, that the array contains different things other than objects is far from insurmountable. You seem to be saying that you need a numeric index on your array - but this was not stated in the question and is not required to implement an iterator. – symcbean May 30 '18 at 12:40
  • @symbecan "and is not required to implement an iterator" Why not? I have to store the position somewhere, which directly references to the array indices. If `next()` is called I have to count up my position until the maximum index, which is checked with `valid()` and so on. If you have a better solution please show me. May would make my code a lot of easier. – codekandis May 30 '18 at 12:47
  • You could use the example in the manual: http://php.net/manual/en/language.oop5.iterations.php – symcbean May 30 '18 at 13:04
  • @symbecan There was a specific reason I decided once to store the position in an internal field instead of using `key()`. I have to digg into it again. If I found out I'll let you know. – codekandis May 30 '18 at 13:11