You problem actually got me really interested (a bit too interested). The solution entails an iteration through sorted arrays and it does not matter what length
the arrays include. You may need to change the code depending on your input, but the basic concepts are the same. You can also make it more intuitive by adding some foreach
loop to do the pairings for you (explained below). This is what I was able to come up with:
Proposal O(nlogn)
complexity algorithm
The other solution and a few others online show some ways of undertaking it using O(n^2)
complexity. I think we can do better though, since this is can be similarly solved using the divide-and-conquer
process done in many O(nlogn)
sorts.
Quick Summmary
This is an O(nlogn)
runtime algorithm that sorts the arrays and then runs across them each time advancing the array pointer in the smaller array (the one with the lower current()
element), attempting to find any duplicates. The sorting is O(logn)
runtime, using PHP's sort
method.
Sorting the arrays
The sorting method will use sort
and will store a variable $max_length
for the iteration of the arrays later. This process is O(logn)
with n being the size of input of arrays to sort. Here we place all three arrays in an array and loop through it to sort them. (This is done in case not all arrays are the same length)
<?php
$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com');
$array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');
$array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');
sort($array1);
sort($array2);
sort($array3);
Iterating through to find duplicates
This part is a bit iffy (I'm hoping some comp-sci expert can help me out here a little to make it faster, since I think it can be faster). We have sorted the arrays correctly. How many iterations do we need to do? Well the answer is, it depends! If we are iterating against array1
to check for duplicates in array2
, then we need to iterate until the largest element is reached (in array1
and array2
). Since we can tell that array1
has the largest element (you do this via. max()
in PHP, but in this you can tell from the letters (all the elements begin with domain, and letters are greater than numbers). Now, all we have to do is figure out what the maximum value we need to reach to ensure we didn't miss any elements (for example, if we just did a maximum length iteration by figuring out the number of elements in the array with the greatest element, we'd miss some elements since the loop might end before we are done incrementing the other smaller array as that array may have many small elements).
Now, we need all the paris of possibilities to be compared (array1 -> array2, array1 -> array3, array2 -> array3) and iterate against them. With each iteration, we will check if the current element we are looking at is greater than the current element in the other array. If it is, we will advance the array pointer in the smaller, else the other array. By doing this, we ensure that we will visit every element until that element is the same (in this case, we reach the else
block since both elements are the same). You can read more about this strategy: Algorithm to tell if two arrays have identical members
For each while
loop completion, we reset the array pointers in both arrays to prepare it for the next comparison.
$end_of_array1 = end($array1);
reset($array1);
while (current($array1) != $end_of_array1) {
if (current($array1) > current($array2)) {
next($array2) || end($array2);
}
elseif (current($array1) < current($array2)) {
next($array1) || end($array1);
}
else {
//Array intersection, values are matching
if (isset($duplicates[current($array1)])) {
array_push($duplicates[current($array1)], 'array1', 'array2');
}
else {
$duplicates[current($array1)] = array('array1', 'array2');
}
next($array1);
next($array2);
}
}
reset($array1);
$end_of_array3 = end($array3);
reset($array1);
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3){
if (current($array1) > current($array3)) {
next($array3) || end($array3);
}
elseif (current($array1) < current($array3)) {
next($array1) || end($array1);
}
else {
//Array intersection, values are matching
if (isset($duplicates[current($array1)])) {
array_push($duplicates[current($array1)], 'array1', 'array3');
}
else {
$duplicates[current($array1)] = array('array1', 'array3');
}
next($array1);
next($array3);
}
}
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3) {
if (current($array2) > current($array3)) {
next($array3) || end($array3);
}
elseif (current($array2) < current($array3)) {
next($array2) || end($array2);
}
else {
//Array intersection, values are matching
if (isset($duplicates[current($array2)])) {
array_push($duplicates[current($array2)], 'array2', 'array3');
}
else {
$duplicates[current($array2)] = array('array2', 'array3');
}
next($array2);
next($array3);
}
}
foreach ($duplicates as $key=>$array) {
$duplicates[$key] = array_unique($array);
}
print_r($duplicates);
$duplicates
needs to be removed and become unique since we pushed many "array1" and "array3" inside of it. Once this is complete, we will have reached all the duplicated elements.
tl;dr and final notes
The full code is below and you can run it here to ensure you get the same results
<?php
$array1 = array('domain.com','domain1.com','domain2.com','domain3.com','domain5.com','domaindd5.com');
$array2 = array('domain.com','domain12.com','domain22.com','domain32.com','domain42.com','domain5.com');
$array3 = array('domain.com','domain31.com','domain332.com','domain33.com','domain5.com','domaindd5.com');
sort($array1);
sort($array2);
sort($array3);
$end_of_array1 = end($array1);
reset($array1);
while (current($array1) != $end_of_array1) {
if (current($array1) > current($array2)) {
next($array2) || end($array2);
}
elseif (current($array1) < current($array2)) {
next($array1) || end($array1);
}
else {
//Array intersection, values are matching
if (isset($duplicates[current($array1)])) {
array_push($duplicates[current($array1)], 'array1', 'array2');
}
else {
$duplicates[current($array1)] = array('array1', 'array2');
}
next($array1);
next($array2);
}
}
reset($array1);
$end_of_array3 = end($array3);
reset($array1);
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3){
//echo 'current value of array1 :' . current($array1) . ' current value of array3: ' . current($array3). '<br/>';
if (current($array1) > current($array3)) {
next($array3) || end($array3);
}
elseif (current($array1) < current($array3)) {
next($array1) || end($array1);
}
else {
//Array intersection, values are matching
if (isset($duplicates[current($array1)])) {
array_push($duplicates[current($array1)], 'array1', 'array3');
}
else {
$duplicates[current($array1)] = array('array1', 'array3');
}
next($array1);
next($array3);
}
}
reset($array2);
reset($array3);
while (current($array3) != $end_of_array3) {
if (current($array2) > current($array3)) {
next($array3) || end($array3);
}
elseif (current($array2) < current($array3)) {
next($array2) || end($array2);
}
else {
//Array intersection, values are matching
if (isset($duplicates[current($array2)])) {
array_push($duplicates[current($array2)], 'array2', 'array3');
}
else {
$duplicates[current($array2)] = array('array2', 'array3');
}
next($array2);
next($array3);
}
}
foreach ($duplicates as $key=>$array) {
$duplicates[$key] = array_unique($array);
}
print_r($duplicates);
?>