4

I would like to parse this kind of code :

{articles mode="extrait" nb="3"}

I am using this regex :

@\{(articles)(?:(?:\s|\ )*(?:(\w+)="(\w+)"))*\}@

But it doesn't work, when I do a preg_match_all, here is the result of a print_r with the $matches parameter:

Array (
    [0] => {articles mode="extraits" nb="3"}
    [1] => articles
    [2] => nb
    [3] => 3
)

I thought that the last * should do the trick of getting all the attributes instead of just the last one.

Do you see what is missing or incorrect ?

Thank you in advance

DaveRandom
  • 87,921
  • 11
  • 154
  • 174
Pascal Messana
  • 277
  • 4
  • 12
  • 2
    You're repeating a capturing group, which causes the capturing group to capture the last match. – nickb Dec 20 '12 at 15:52
  • And I may add there is no way to get an arbitrary number of captures in PHP. All you can do with regex is to match the whole block first and then run a second regex on the block to extract all attributes (in separate matches) – Martin Ender Dec 20 '12 at 15:59
  • Please see this answer to a similar question: http://stackoverflow.com/a/574968/1447613 – BornKillaz Dec 20 '12 at 16:23

3 Answers3

0
$instances = array();

@ preg_match_all( '/\{articles([^\}]+)\}/', $string, $articles );

if ( false === empty( $articles[1][0] ))
{
   foreach ( $articles[1] as $article )
   {
      @ preg_match_all( '/\b(\w+)="([^"]+)"/', $article, $arguments );

      if ( false === empty( $arguments[0][0] ))
      {
         $settings = array();

         foreach ( $arguments[0] as $index => $argument )
         {
            $settings[$arguments[1][$index]] = $arguments[2][$index];
         }

         $instances[] = $settings;
      }

      unset( $arguments );
   }
}

print_r( $instances );
Ingmar de Lange
  • 289
  • 3
  • 16
0

As @nickb already commented, a repeated capturing group only retains its last match. AFAIK, only .NET provides an implementation that retains all matches. So, I agree with @m.buettner that you must use at least two matches. And @IngmardeLange's solution appears to be an alternate implementation, though I haven't checked it, but still uses at least two matches.

For fun, I devised a way to do this using a single match. The initial idea was to use lookbehinds for the @{article part, but variable-length lookbehinds aren't supported. Then, (unfortunately, as you're about to witness) I remembered @TimPietzcker once mentioning a trick for implementing variable-length lookbehinds: doing variable-length lookaheads on the reversed string. (Please don't ever actually do use this method.)

<?php

    function get_attr_val_matches($tag, $subject)
    {
        $regex = '/"(\w+)"=(\w+)\s+(?=(?:"\w+"=\w+\s+)*' . strrev($tag) . '\{@)/';
        preg_match_all($regex, strrev($subject), $matches, PREG_SET_ORDER);

        foreach ($matches as &$match)
        {
            $match = array_map(strrev, $match);
            $match = array($match[0], array_reverse(array_slice($match, 1)));
        }

        return array_reverse($matches);
    }

    $tag = 'articles';
    $subject = '@{articles mode="extrait" nb="3"}';

    print_r(get_attr_val_matches($tag, $subject));

?>

Output:

Array
(
    [0] => Array
        (
            [0] =>  mode="extrait"
            [1] => Array
                (
                    [0] => mode
                    [1] => extrait
                )
        )

    [1] => Array
        (
            [0] =>  nb="3"
            [1] => Array
                (
                    [0] => nb
                    [1] => 3
                )
        )
)

Here's a running example.

Quite obviously, if I haven't disclaimed this enough already, all the reversing costs more than just doing two matches. But maybe there's an application to generically converting expressions with variable-length lookbehinds, to reversed lookaheads as above, then back. Though probably not.

Andrew Cheong
  • 29,362
  • 15
  • 90
  • 145
0

Thanks for your answers, even if I barely understand a few things.

I found another way which is much easier but limited to only 2 arguments (I don't need more for the moment) :

@\{(articles)((\s)(\w+)="(\w+)")?((\s)(\w+)="(\w+)")?\}@

Array 
( 
[0] => {articles nb="2" mode="extrait"} 
[1] => articles 
[2] => nb="2" 
[3] => 
[4] => nb 
[5] => 2 
[6] => mode="extrait" 
[7] => 
[8] => mode 
[9] => extrait 
)

And then I do something like :

if($key = array_search('mode', $option)) $mode = $option[$key + 1];

if($mode == 'extrait')
{
    // my stuff here
}

Again, thank you for all your answers!

Pascal Messana
  • 277
  • 4
  • 12