10

So I have a string which I'm turning into an array but I want to separate each word using a regex. I'm matching a whole word using the below function.

function substr_count_array($haystack, $needle)
{
     $initial = 0;
     $bits = explode(' ', $haystack);

     foreach ($needle as $substring) 
     {
        if (!in_array($substring, $bits))
        {
            continue;
        }

        $initial += substr_count($haystack, $substring);
     }

     return $initial;
}

The problem is that it matches the string animal for example but not animals. And if I do a partial match like this:

function substr_count_array2($haystack, $needle)
{
     $initial = 0;

     foreach ($needle as $substring) 
     {
          $initial += substr_count($haystack, $substring);
     }

     return $initial;
}

It also matches, let's say, a since it's contained withing the word animals and returns 2. How do I explode() using a regular expression as a delimiter so that I may, for example, match every string that has a length of 5-7 characters?

Explained simpler:

$animals = array('cat','dog','bird');
$toString = implode(' ', $animals);
$data = array('a');

echo substr_count_array($toString, $data);

If I search for a character such as a, it gets through the check and validates as a legit value because a is contained within the first element. But if I match whole words exploded by a space, it omits them if they are not separated by a space. Thus, I need to separate with a regular expression that matches anything AFTER the string that is to be matched.

Noam M
  • 3,156
  • 5
  • 26
  • 41
Jessie Stalk
  • 951
  • 3
  • 10
  • 15

1 Answers1

15

Simply put, you need to use preg_split instead of explode.

While explode will split on constant values, preg_split will split based on a regular expression.

In your case, it would probably be best to split on non-word characters \W+, then manually filter the results for length.

Mr. Llama
  • 20,202
  • 2
  • 62
  • 115
  • Something like this? `preg_split('(.+?)', $haystack);` – Jessie Stalk Jul 18 '14 at 17:12
  • 1
    @JessieStalk - Not quite. The regular expression you pass to `preg_split` is the pattern the string is *split* on, not what strings you want to *keep*. If you're trying to keep the words in your input, you should split on non-word characters: `preg_split('/\W+/', $haystack)` – Mr. Llama Jul 18 '14 at 19:19
  • Thanks for you time and effort :) – Jessie Stalk Jul 18 '14 at 19:23