4

currently i have the following situation: I'm trying to "parse" a text, looking for placeholders (their notation is "{...}") and later on, i will replace them with the actual text.

I thought about regular expressions

$foo = "Hello World {foo{bar}} World Hello!";
$bar = array();
preg_match_all('@\{.+\}@U', $foo, $bar);
var_dump($bar);

But this returns

array(1) { [0]=> array(1) { [0]=> string(9) "{foo{bar}" } }

Making it greedy will result in:

array(1) { [0]=> array(1) { [0]=> string(10) "{foo{bar}}" } }

But i want the result to be something like:

array(1) { [0]=> array(2) { [0]=> string(5) "{bar}" [1]=> string(10) "{foo{bar}}" } }

Is there a way to reach this with the help of preg_match(_all) and regular expressions?

Or do i have to loop over my $bar again and again, until there are no sub-statements left in the result set?

sree
  • 498
  • 4
  • 19
Timetrick
  • 186
  • 1
  • 9
  • How are you going to replace the text anyway? It would somehow made sense to iterate while there are placeholders to be found if you want to allow nested groups. – Mikulas Dite Sep 03 '12 at 09:03

1 Answers1

3

You're lucky you have PCRE for this. This has to be solved using recursion: http://regex101.com/r/pO3hA0

/(?=({(?>[^{}]|(?1))+}))/g (you don't need the g flag in php)

Firas Dib
  • 2,743
  • 19
  • 38
  • Thank you for that! Is there a "guarantee" that {bar} will always precede {foo{bar}} in the result array? – Timetrick Sep 03 '12 at 09:09
  • The engine will match from the "outside" and work its way in, if you will. So you _should_ get `[0] => {foo{bar}} [1] => {bar}` – Firas Dib Sep 03 '12 at 09:16
  • Thanks, but i wrote a little function that will sort the results by "complexity" (counting the braces in the result...) – Timetrick Sep 03 '12 at 09:25