0

What I'm trying to do is take items from one array, such as:

$array1 = array(
"google-com",
"youtube-com",
);

And remove items from a second array if the above items are included (but BROAD match, not exact).

$array2 = array(
"www-google-com",
"www-youtube-com",
"www-facebook-com",
"www-twitter-com",
);

Expected output:

 www-facebook-com
 www-twitter-com

Note: The first array would be with "example.com" style URLs and the second with "https://www.example.com/" URLs.

It seems array_diff only works with exact matches, and after much searching, I can't seem to find a way to make it work for broad matches.

Thanks for your help!

Joe
  • 143
  • 10
  • Are you not able to better prepare one of the arrays? Is this sample data realistic data in terms of your project? <- this matters – mickmackusa Jan 28 '20 at 07:58
  • The idea is to save time, hence why the arrays are in different formats. I'm reviewing the various responses now. Thanks everyone for your time! – Joe Jan 28 '20 at 08:09
  • In the future, Joe, please always complete your question by providing your best failing coding attempt. This is a sign of respect toward volunteers among other things. – mickmackusa Jan 28 '20 at 08:29
  • Not my DV, just so you know. – mickmackusa Jan 28 '20 at 09:10

4 Answers4

2

The simplest way to do this is to iterate over each of the arrays, using a function such as strpos to see if the short URLs are contained in the longer ones:

$output = array();
foreach ($array2 as $url) {
    $found = false;
    foreach ($array1 as $short_url) {
        $found = $found || (strpos($url, $short_url) !== false);
    }
    if (!$found) {
        $output[] = $url;
    }
}
print_r($output);

Output:

Array
(
    [0] => www-facebook-com
    [1] => www-twitter-com
)

Without knowing exactly what you mean by a BROAD match, strpos is probably close. You can always write a custom function to do the matching and replace strpos in the code above with it.

Nick
  • 138,499
  • 22
  • 57
  • 95
2

When making iterated searches, always provide an early exit.

Nested loops with early break conditions will be most performant.

Code: (Demo)

$array1 = array(
"google-com",
"youtube-com",
);

$array2 = array(
"www-google-com",
"www-youtube-com",
"www-facebook-com",
"www-twitter-com",
);

foreach ($array2 as $index => $haystack) {
    foreach ($array1 as $needle) {
        if (strpos($haystack, $needle) !== false) {
            unset($array2[$index]);
            break;
        }
    }
}
var_export(array_values($array2));

That said, if your data is somewhat predictable and you can prepare just one of the arrays, you can spare much of this iterated work.

mickmackusa
  • 43,625
  • 12
  • 83
  • 136
  • 1
    There is no risk when iterating a copy of the data. @viv – mickmackusa Jan 28 '20 at 08:18
  • Please avoid seeding fears unless you can provide a case where this breaks. Foreach is working on a copy of the array by default. It would be better to inform the OP of the performance gains and stability of this answer. – mickmackusa Jan 28 '20 at 08:22
  • I stand corrected. – nice_dev Jan 28 '20 at 08:30
  • The most performant way I feel(though seems like an overkill if the dataset is small) is to make a [trie](https://en.wikipedia.org/wiki/Trie) of `$array1` in a reversed word way and to search through $array2's each site in a reversed word way in the trie. – nice_dev Jan 28 '20 at 08:35
1

You're right, array_diff is not a solution here. One of the solutions is using a preg_grep to find records and then unset keys in $array2:

$array1 = array(
    "google-com",
    "youtube-com",
);

$array2 = array(
    "www-google-com",
    "www-youtube-com",
    "www-facebook-com",
    "www-twitter-com",
);

foreach ($array1 as $search) {
    foreach (preg_grep('/' . $search . '/', $array2) as $index => $value) {
        unset($array2[$index]);
    }
}

print_r($array2);
u_mulder
  • 54,101
  • 5
  • 48
  • 64
  • 1
    `preg_quote()` may be necessary in the project, but not in the sample data. I don't think I like the iterated `preg_` calls. – mickmackusa Jan 28 '20 at 07:56
  • 2
    Several of the answers here work beautifully. This has a shorter syntax but may be prone to breakage depending on your actual project input and is possibly the slowest performer. FYI @Joe – mickmackusa Jan 28 '20 at 08:15
  • Actually your response @mickmackusa does seem to be more 'polished' in the sense that it resets the keys as well, and makes the new first key "0". If I end up using your answer I will change the correct answer selection on this post. Thanks! – Joe Jan 28 '20 at 08:19
0

Hope this resolves your answer to the question :)

$array1 = array(
"google-com",
"youtube-com",
);
$array1 = preg_filter('/^/', 'www-', $array1);

$array2 = array(
"www-google-com",
"www-youtube-com",
"www-facebook-com",
"www-twitter-com",
);


print_r(array_diff($array2, $array1));

BhAvik Gajjar
  • 473
  • 3
  • 19
  • "Hope this helps" is not an explanation of how your solution work nor why you believe the technique is advisable. – mickmackusa Jan 28 '20 at 08:03
  • The problem with this answer is that it doesn't account for the many possible variations of a URL. For example, if the URL is http or https and also the trailing slash on the end. It's a very narrow response, but thanks nonetheless! – Joe Jan 28 '20 at 08:14
  • 1
    If your posted sample data doesn't effectively reflect the the quality and variability of your project data, then the onus is on your sample data, not this volunteer's solution. You must provide realistic data if you expect to receive the best / most durable solutions. @Joe – mickmackusa Jan 28 '20 at 09:51