3

So I am trying to find a way to search through multiple arrays and have it filter out all duplicate entries and have it display what arrays that the duplicate entry was found in.

example:

$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com');
$array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');
$array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');

then the read out would display something like:

domain.com => array 1, array 2, array 3
domain5.com => array 1, array 3

Thanks in advance for any suggestions

Jon Eichler
  • 31
  • 1
  • 4
  • 1
    What have you tried already, what problems did you find, and what do you need help with? Answers to this are likely to be opinion-based, which doesn't fit very well with the way the site works. – Andy Clifton Sep 08 '15 at 01:23
  • The fact that you say "best way" makes me agree with the above. You can easily check if an element is in an array using `in_array()`. From there, try something and ask a question when you get stuck. – scrowler Sep 08 '15 at 01:29
  • I think you have the answer in your question. You want something like `domain.com => array 1, array 2, array 3`. So build an array with domains as keys. Iterate your 3 (or more) arrays and populate the new array while you iterate. You'll have to iterate each array exactly once. Your final structure will look something like this `array( 'domain.com' => array( 'array 1', 'array 2', 'array 3'), 'domain5.com' => ... )` – mkasberg Sep 08 '15 at 01:37
  • 1
    Does it have to be the 'best way' or will 'useful ways' be adequate or, even, just 'ways'? – Ryan Vincent Sep 08 '15 at 01:48
  • will it have always 3 arrays total or that count differs ? – viral Sep 08 '15 at 07:27

7 Answers7

1

The idea behind this code is simple :) For each entry in all the arrays provided, the The function firstly records the artificial name of the container array in the $raw array, and then removes the entries not having more than one occurrence in that array.

<?php
function duplicates() {
    $raw = array();
    $args = func_get_args();
    $i = 1;
    foreach($args as $arg) {
        if(is_array($arg)) {
            foreach($arg as $value) {
                $raw[$value][] = "array $i";
            }
            $i++;
        }
    }

    $out = array();
    foreach($raw as $key => $value) {
        if(count($value)>1)
            $out[$key] = $value;
    }
    return $out;
}

echo '<pre>';
print_r(
    duplicates(
        array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com'),
        array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com'),
        array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com')
    )
);
echo '</pre>';
?>

Due to the func_get_args() function, you can provide an arbitrary count of input arrays to the duplicates() function above. Here is an output of the code above:

Array
(
    [domain.com] => Array
        (
            [0] => array 1
            [1] => array 2
            [2] => array 3
        )

    [domain5.com] => Array
        (
            [0] => array 1
            [1] => array 2
            [2] => array 3
        )

    [domaindd5.com] => Array
        (
            [0] => array 1
            [1] => array 3
        )

)
someOne
  • 1,975
  • 2
  • 14
  • 20
0

You problem actually got me really interested (a bit too interested). The solution entails an iteration through sorted arrays and it does not matter what length the arrays include. You may need to change the code depending on your input, but the basic concepts are the same. You can also make it more intuitive by adding some foreach loop to do the pairings for you (explained below). This is what I was able to come up with:

Proposal O(nlogn) complexity algorithm

The other solution and a few others online show some ways of undertaking it using O(n^2) complexity. I think we can do better though, since this is can be similarly solved using the divide-and-conquer process done in many O(nlogn) sorts.

Quick Summmary

This is an O(nlogn) runtime algorithm that sorts the arrays and then runs across them each time advancing the array pointer in the smaller array (the one with the lower current() element), attempting to find any duplicates. The sorting is O(logn) runtime, using PHP's sort method.

Sorting the arrays

The sorting method will use sort and will store a variable $max_length for the iteration of the arrays later. This process is O(logn) with n being the size of input of arrays to sort. Here we place all three arrays in an array and loop through it to sort them. (This is done in case not all arrays are the same length)

<?php
$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com');
$array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');
$array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');
sort($array1);
sort($array2);
sort($array3);

Iterating through to find duplicates

This part is a bit iffy (I'm hoping some comp-sci expert can help me out here a little to make it faster, since I think it can be faster). We have sorted the arrays correctly. How many iterations do we need to do? Well the answer is, it depends! If we are iterating against array1 to check for duplicates in array2, then we need to iterate until the largest element is reached (in array1 and array2). Since we can tell that array1 has the largest element (you do this via. max() in PHP, but in this you can tell from the letters (all the elements begin with domain, and letters are greater than numbers). Now, all we have to do is figure out what the maximum value we need to reach to ensure we didn't miss any elements (for example, if we just did a maximum length iteration by figuring out the number of elements in the array with the greatest element, we'd miss some elements since the loop might end before we are done incrementing the other smaller array as that array may have many small elements). Now, we need all the paris of possibilities to be compared (array1 -> array2, array1 -> array3, array2 -> array3) and iterate against them. With each iteration, we will check if the current element we are looking at is greater than the current element in the other array. If it is, we will advance the array pointer in the smaller, else the other array. By doing this, we ensure that we will visit every element until that element is the same (in this case, we reach the else block since both elements are the same). You can read more about this strategy: Algorithm to tell if two arrays have identical members

For each while loop completion, we reset the array pointers in both arrays to prepare it for the next comparison.

$end_of_array1 = end($array1);
reset($array1);
while (current($array1) != $end_of_array1) {
    if (current($array1) > current($array2)) {
        next($array2) || end($array2);
    }
    elseif (current($array1) < current($array2)) {
        next($array1) || end($array1);
    }
    else {
        //Array intersection, values are matching
        if (isset($duplicates[current($array1)])) {
            array_push($duplicates[current($array1)], 'array1', 'array2');
        }
        else {
            $duplicates[current($array1)] =  array('array1', 'array2');
        }
        next($array1);
        next($array2);
    }
}
reset($array1);
$end_of_array3 = end($array3);
reset($array1);
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3){
    if (current($array1) > current($array3)) {
        next($array3) || end($array3);
    }
    elseif (current($array1) < current($array3)) {
        next($array1) || end($array1);
    }
    else {
        //Array intersection, values are matching
        if (isset($duplicates[current($array1)])) {
            array_push($duplicates[current($array1)], 'array1', 'array3');
        }
        else {
            $duplicates[current($array1)] = array('array1', 'array3');
        }
        next($array1);
        next($array3);
    }
}
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3) {
    if (current($array2) > current($array3)) {
        next($array3) || end($array3);
    }
    elseif (current($array2) < current($array3)) {
        next($array2) || end($array2);
    }
    else {
        //Array intersection, values are matching
        if (isset($duplicates[current($array2)])) {
            array_push($duplicates[current($array2)], 'array2', 'array3');
        }
        else {
            $duplicates[current($array2)] =  array('array2', 'array3');
        }
        next($array2);
        next($array3);
    }
}
foreach ($duplicates as $key=>$array) {
    $duplicates[$key] = array_unique($array);
}
print_r($duplicates);

$duplicates needs to be removed and become unique since we pushed many "array1" and "array3" inside of it. Once this is complete, we will have reached all the duplicated elements.

tl;dr and final notes

The full code is below and you can run it here to ensure you get the same results

<?php
$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com');
$array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');
$array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');
sort($array1);
sort($array2);
sort($array3);
$end_of_array1 = end($array1);
reset($array1);
while (current($array1) != $end_of_array1) {
    if (current($array1) > current($array2)) {
        next($array2) || end($array2);
    }
    elseif (current($array1) < current($array2)) {
        next($array1) || end($array1);
    }
    else {
        //Array intersection, values are matching
        if (isset($duplicates[current($array1)])) {
            array_push($duplicates[current($array1)], 'array1', 'array2');
        }
        else {
            $duplicates[current($array1)] =  array('array1', 'array2');
        }
        next($array1);
        next($array2);
    }
}
reset($array1);
$end_of_array3 = end($array3);
reset($array1);
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3){
    //echo 'current value of array1 :' . current($array1) . ' current value of array3: ' . current($array3). '<br/>';
    if (current($array1) > current($array3)) {
        next($array3) || end($array3);
    }
    elseif (current($array1) < current($array3)) {
        next($array1) || end($array1);
    }
    else {
        //Array intersection, values are matching
        if (isset($duplicates[current($array1)])) {
            array_push($duplicates[current($array1)], 'array1', 'array3');
        }
        else {
            $duplicates[current($array1)] = array('array1', 'array3');
        }
        next($array1);
        next($array3);
    }
}
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3) {
    if (current($array2) > current($array3)) {
        next($array3) || end($array3);
    }
    elseif (current($array2) < current($array3)) {
        next($array2) || end($array2);
    }
    else {
        //Array intersection, values are matching
        if (isset($duplicates[current($array2)])) {
            array_push($duplicates[current($array2)], 'array2', 'array3');
        }
        else {
            $duplicates[current($array2)] =  array('array2', 'array3');
        }
        next($array2);
        next($array3);
    }
}
foreach ($duplicates as $key=>$array) {
    $duplicates[$key] = array_unique($array);
}
print_r($duplicates);
?>
Community
  • 1
  • 1
q.Then
  • 2,743
  • 1
  • 21
  • 31
0

From the $array1, $array2, $array3 arrays shown here, i am assuming there is no repetition in same array.

So there are two tasks to be carried out

  • Find duplicates in other arrays.
  • Create an array containing name where repetition occurs.

Here is my effort,

$final = []; // initialize the final array

foreach(array_merge($array1,$array2,$array3) as $domain)
    $final[] = $domain; // group all array's elements
unset($domain);         // unset garbage after foreach execution

Now, we have a list of all arrays' elements gathered in $final.

$final = array_count_values($final); // find repetition and its count

$final = array_diff($final, [1]);    // remove single occurances

Now, final will look like this, with domain name as key and number of times it is repeated in all arrays.

array (
  'domain.com' => 3,
  'domain5.com' => 3,
  'domaindd5.com' => 2,
)

Now, find where the repetition occurs in given 3 arrays

foreach($final as $domain => &$count)
{
    $count = []; // make count an array

    $temp1 = in_array($domain, $array1); // check if it is in $array1
    $temp2 = in_array($domain, $array2); // check if it is in $array2
    $temp3 = in_array($domain, $array3); // check if it is in $array3

    if($temp1 !== false) // if in array then fill array name
        $count[] = 'array1';
    if($temp2 !== false)
        $count[] = 'array2';
    if($temp3 !== false)
        $count[] = 'array3';
}
unset($domain, $count); // unset garbage

Thats all, Your $final array will look like,

array (
  'domain.com' => 
  array (
    0 => 'array1',
    1 => 'array2',
    2 => 'array3',
  ),
  'domain5.com' => 
  array (
    0 => 'array1',
    1 => 'array2',
    2 => 'array3',
  ),
  'domaindd5.com' => 
  array (
    0 => 'array1',
    1 => 'array3',
  ),
)

See it in action, here

viral
  • 3,724
  • 1
  • 18
  • 32
0

If you like function programming, here is a somewhat concise method / one-liner:

Code: (Demo)

$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com');
$array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');
$array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');

var_export(array_filter(array_merge_recursive(array_fill_keys($array1,'array1'),array_fill_keys($array2,'array2'),array_fill_keys($array3,'array3')),'is_array'));

Output:

array (
  'domain.com' => 
  array (
    0 => 'array1',
    1 => 'array2',
    2 => 'array3',
  ),
  'domain5.com' => 
  array (
    0 => 'array1',
    1 => 'array2',
    2 => 'array3',
  ),
  'domaindd5.com' => 
  array (
    0 => 'array1',
    1 => 'array3',
  ),
)

I am not really qualified to speak to you about nlogn performance, but I do think it is pretty valuable in terms of brevity.

Here is the breakdown into multiple lines:

var_export(
    array_filter(
        array_merge_recursive(
            array_fill_keys($array1,'array1'), // ["domain.com"=>"array1","domain1.com"=>"array1",...]
            array_fill_keys($array2,'array2'),
            array_fill_keys($array3,'array3')
        ),
        'is_array'
    )
);
  • array_fill_keys() will generate an array with "[...].com" values as keys, and the "array variable names" as the static values.
  • array_merge_recursive() will combine the three generated arrays into one array. The duplicate keys will have their values merged into subarrays while unique keys will have their data stored as a string.
  • array_filter() will simply remove the unique "[...].com" occurrences by retaining only values with a data type of array.
mickmackusa
  • 43,625
  • 12
  • 83
  • 136
0

http://docs.php.net/array_intersect

Returns an array containing all of the values in array1 whose values exist in all of the parameters.

    $array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');

    $array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');

    $duplicate = array_intersect(array_intersect($array1,$array2),$array3);

?>

Return

print_r($duplicate);

Array ( [0] => domain.com [4] => domain5.com )

0

Another way of doing this, smaller / simpler than other answers and allows a verbose output. Obviously needs a bit more wrapping for a specific use case but hopefully you can see the methodology.

$mar = [$array0, $array1, $array2];
foreach($mar as $i => $testAr){
    for ($ii=$i+1; $ii < count($mar); $ii++) { 
        foreach($mar[$ii] as $val){
            if (in_array($val, $testAr)){
                echo "$val in array $i and $ii<br>";
            }
        }
    }
}
Jamie Robinson
  • 832
  • 10
  • 16
0

$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com'); $array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com'); $array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com'); $commonValue = array_intersect($array1, $array2,$array3); print_r($commonValue); die;

$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com');
$array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');
$array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');
$commonValue = array_intersect($array1, $array2,$array3);
print_r($commonValue); 
Mitali Patel
  • 395
  • 2
  • 9