2

Let's say I have this string:

1 + 2 * (3 + (23 + 53 - (132 / 5) + 5) - 1) + 2 / 'test + string' - 52

I want to split it into an array of operators and non-operators, but anything between the () and ' must not be split.

I want the output to be:

[1, "+", 2, "*", "(3 + (23 + 53 - (132 / 5) + 5) - 1)", "+", 2, "/", "'test + string'", "-", 52]

I'm using this code:

preg_split("~['\(][^'()]*['\)](*SKIP)(*F)|([+\-*/^])+~", $str, -1, PREG_SPLIT_DELIM_CAPTURE);

The technique does what I want with the operators and the ', but not for (). However it only keeps (132 / 5) (the deepest nested parenthetical expression) and splits all the other ones, giving me this output:

[1, "+", 2, "*", "(3", "+", "(23", "+", 53, "-", "(132 / 5)", "+", "5)", "-", "1)", "+", 2, "/", "'test + string'", "-", 52]

How can I ensure that the outermost parenthetical expression and all of its contents remain together?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136

2 Answers2

3

You might use a pattern to recurse the first sub pattern matching balanced parenthesis and then use the SKIP FAIL. After the alternation you can still use the capture group, which will be group 2 and the values will be kept due to the PREG_SPLIT_DELIM_CAPTURE flag.

To remove the empty entries, you can add the PREG_SPLIT_NO_EMPTY flag.

(?:(\((?:[^()]++|(?1))*\))|'[^']*')(*SKIP)(*F)|([+\-*/^])

Regex demo

$str = "1 + 2 * (3 + (23 + 53 - (132 / 5) + 5) - 1) + 2 / 'test + string' - 52";
$result = preg_split("~(?:(\((?:[^()]++|(?1))*\))|'[^']*')(*SKIP)(*F)|([+\-*/^])~", $str, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);

print_r($result);

Output

Array
(
    [0] => 1 
    [1] => +
    [2] =>  2 
    [3] => *
    [4] =>  (3 + (23 + 53 - (132 / 5) + 5) - 1) 
    [5] => +
    [6] =>  2 
    [7] => /
    [8] =>  'test + string' 
    [9] => -
    [10] =>  52
)
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Hello, thanks for letting me know, this works perfectly, also, I figured out another way for doing what I wanted, which is: `('[^']*'|(\(.+\)))(*SKIP)(*F)|([+\-*/^]+)` – Saif Alhammouri Feb 14 '21 at 11:24
  • 1
    @SaifAlhammouri You are welcome. For your example string that would work, but note that when you for example add parenthesis at the end of the string it would not give the expected matches https://regex101.com/r/QdtBK5/1 – The fourth bird Feb 14 '21 at 11:29
  • 1
    Oh, I didn't test it that way, thank you so much for letting me know, I'll be using your way. – Saif Alhammouri Feb 14 '21 at 13:10
2

I do like @thefourthbird's recursive subpattern, but I would prefer to standardize the output elements so that all whitespace is removed.

I won't use delimiter capturing or skip-fail, but fullstring restarts (\K) to omit the spaces.

Code: (Demo)

preg_split(
    "~(?:(\((?:[^()]+|(?1))*\))|'[^']*'|[\d.]+|[*/^+-])\K ?~",
    $str,
    -1,
    PREG_SPLIT_NO_EMPTY
)

I have done similar techniques on SO like this one. Another consideration is: how do you want to handle signed numbers? Should the numberic entity retain the sign symbol or should it be separated as if it were an operator?

mickmackusa
  • 43,625
  • 12
  • 83
  • 136