2

I want to break a string according to the following rules:

  1. all consecutive alpha-numeric chars, plus the dot (.) must be treated as one part
  2. all other consecutive chars must be treated as one part
  3. consecutive combinations of 1 and 2 must be treated as different parts
  4. no whitespace must be returned

For example this string:

Method(hierarchy.of.properties) = ?

Should return this array:

Array
(
    [0] => Method
    [1] => (
    [2] => hierarchy.of.properties
    [3] => )
    [4] => =
    [5] => ?
)

I was unsuccessful with preg_split(), as AFAIK it cannot treat the pattern as an element to be returned.

Any idea for a simple way to do this?

j0k
  • 22,600
  • 28
  • 79
  • 90
BenMorel
  • 34,448
  • 50
  • 182
  • 322
  • I thought about something like `preg_split('/[^a-z0-9\.]+/i', ...)` but couldn't go much further for the reason mentioned above. – BenMorel Jun 27 '11 at 12:12

2 Answers2

3

You probably should use preg_match_all over preg_split.

preg_match_all('/[\w|\.]+|[^\w\s]+/', $string, $matches);
print_r($matches);

Output:

Array
(
    [0] => Array
        (
            [0] => Method
            [1] => (
            [2] => hierarchy.of.properties
            [3] => )
            [4] => =
            [5] => ?
        )

)
Glass Robot
  • 2,438
  • 19
  • 11
  • Makes sense, I was too focused on `preg_split()`! Your code works for me. Thank you. – BenMorel Jun 27 '11 at 12:48
  • 1
    Just a side note: beware that `\w` is locale-dependent, and might match accented chars as well. It's better to stick with `[a-z]` and keep the control over what's matched and what's not! – BenMorel Jun 27 '11 at 12:56
0

This should do what you want:

$matches = array();
$string = "Method(hierarchy.of.properties) = ?";
foreach(preg_split('/(12|[^a-zA-Z0-9.])/', $string, 0, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY) as $match) {
    if (trim($match) != '')
        $matches[] = $match;
}

I used a loop to remove all whitespace matches, since as far as I know there isn't a feature in preg_split() to that for you.

EdoDodo
  • 8,220
  • 3
  • 24
  • 30
  • Unfortunately, your code breaks with several consecutive non-dot-and-alphanumeric chars: `==` gets split into `=` and `=`. I'll go with Glass Robot's code, which looks cleaner & stronger. Thanks anyway! – BenMorel Jun 27 '11 at 12:48