0

I would like to use preg_match_all() to extract the content between [[ and ]], but ignoring [[[ and ]]], so for example this text:

$text = <<<TEXT
Some text going here

[[ 1. this is a text ]]

another text but multiple lines

[[ 2. this 
is a 
text ]]

This should be ignored, haveing 3 on the left

[[[ 3. this is a text ]]

This should be ignored, haveing 3 on the right

[[ 4. this is a text ]]]

This should be ignored, haveing 3 both on the left and right

[[[ 5. this is a text ]]]

This is the final sentence.

[[ 6. this is a text ]]
TEXT;

if (preg_match_all("(?!<\[)(\[\[.*?\]\])(?!\[)", $text, $tags, PREG_PATTERN_ORDER)) {
        $tags = $tags[0];
}

echo '<pre>';
print_r(tags);
echo '</pre>';

So only selecting 1., 2., and 6. But the regex I've tried above is selecting everything except the 2., not working as expected.

user702300
  • 1,211
  • 5
  • 22
  • 32

3 Answers3

4

You can use this pattern:

preg_match_all('~(?<!\[)\[\[(?!\[)([^]]*)]](?!])~', $text, $tags);

Notes:
No need to specify PREG_PATTERN_ORDER since it is the default set of preg_match* functions.
I have added capturing parenthesis for the content inside square brackets, if you don't need, you can remove them.
If square brackets are not allowed inside tags, the pattern can be shorten to:

~(?<!\[)\[\[([^][]*)]](?!])~
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • NICELY done. Had to be done with the lookarounds. Created a working example for you: http://regex101.com/r/wR3lD7 – brandonscript Dec 12 '13 at 01:27
  • Thank you so much! It worked great! Regex is so complicated, still learning :) – user702300 Dec 12 '13 at 01:39
  • It was the most elegant, and you didn't resort to making a giant wall of code to attract attention ;) – brandonscript Dec 12 '13 at 01:45
  • @r3mus: But if you want, you can build a wall of code with a single pattern if you use the verbose mode (\x), subpattern definitions (`(?(DEFINE)(?...)(?...))`) and comments. It is less concise, but can be more readable (and useful for long patterns). – Casimir et Hippolyte Dec 12 '13 at 01:58
1

Here's a regex that should do the job:

((?<!\[)\[\[([^\[][^\]]*)\]\](?!\]))

REGEX 101

Breaking this down

  • Anything not proceeded by a [
  • [[
  • Any character but [
  • Any character but ] 0 or more times
  • ]]
  • Not followed by a ]

This should be bullet proof, except that it requires at least 1 character in between [[ and ]].

Daniel Gimenez
  • 18,530
  • 3
  • 50
  • 70
  • Thank you, it worked wonderfully! Just had to select the correct answer that came first by @Casimir et Hippolyte – user702300 Dec 12 '13 at 01:40
0

Try:

preg_match_all('/(\A|[^[])\[{2}[^[](?<content>[^]]+)[^]]\]{2}([^]]|\z)/s', ...)

http://regex101.com/r/jC2mM0

http://codepad.viper-7.com/bbs3oR

Array
(
    [0] => Array
        (
            [0] => 
[[ 1. this is a text ]]
            [1] => 
[[ 2. this 
is a 
text ]]
            [2] => 
[[ 6. this is a text ]]
        )

    [1] => Array
        (
            [0] => 1. this is a text
            [1] => 2. this 
is a 
text
            [2] => 6. this is a text
        )

    [2] => Array
        (
            [0] => 
            [1] => 
            [2] => 
        )

)
Petah
  • 45,477
  • 28
  • 157
  • 213