5

I try to catch this string [[....]] (including brackets)

where .... can be anything (including non-printable) except ]]

Here is the source where to match :

var myString = 'blablablabla[["<strong>LA DEFENSE 4 TEMPS ( La Rotonde )</strong><br />Centre commercial LES 4 TEMPS",
                         48.89141725,
                         2.23478235,
                         "4T"],
    ["<strong>ANGERS</strong><br />Centre commercial GEANT",
                         48.89141725,
                         2.23478235,
                         "4T"]]blablablabla'

I try to use this method [^\]]+ to match all chars/non-chars except double bracket. The problem i have is that i do not know how to use this method with a bracket that is immediatly after the first bracket [^\]\]]+.

Is there a solution with positive/negative lookahead or word boundary ?

(\[\[[^\](?=\])]+)

Regular expression visualization

Debuggex Demo

Any help please ?

Cédric
  • 401
  • 3
  • 9

2 Answers2

2

In JavaScript, to match any text between some delimiters that consist of more than one character is best achieved with the [^]/[\s\S]/[\d\D]/[\w\W] construct with a lazy quantifier (*? matching 0 or more occurrences, or +? matching 1 or more occurrences of the preceding subpattern, but as few as possible to return a valid match).

While [^] construct matching any character including a newline is JavaScript specific, [\s\S] and its variants are mostly cross-platform constructs that will work in PCRE, .NET, Python, Java, etc. The [...] in this case is a character class that contains two opposite shorthand classes. Since \s matches all whitespace characters and \S matches all non-whitespace characters, this [\s\S] matches any symbol there is in any input.

I'd recommend to avoid using (.|\n). This construct causes more backtracking steps to occur and slows regex search down.

So, you can use

\[\[[\d\D]*?]]

See JS regex demo

Here is a code snippet:

var re = /\[\[[\d\D]*?]]/g; 
var str = 'blablablabla[["<strong>LA DEFENSE 4 TEMPS ( La Rotonde )</strong><br />Centre commercial LES 4 TEMPS",\n                         48.89141725,\n                         2.23478235,\n                         "4T"],\n    ["<strong>ANGERS</strong><br />Centre commercial GEANT",\n                         48.89141725,\n                         2.23478235,\n                         "4T"]]blablablabla';
var m;
 
while ((m = re.exec(str)) !== null) {
    console.log(m[0]);
}

UPDATE

In this case, when the delimiters are different and consist of just 2 characters, you can use a technique of matching all characters other than the first symbol of the closing delimiter and then 0 or more sequences of the whole closing delimiter followed by 1 or more occurrences of any symbol other than the first symbol in the closing delimiter.

\[\[[^\]]*(?:][^\]]+)*]]

See regex demo

The linear character of this regex makes it really fast.

P.S. I also want to note that you do not need to escape the ] outside of character class in JS regex, but it must be escaped inside a character class - always.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Try this:

\[\[(.|\n)*?\]\]

https://regex101.com/r/gR5oJ3/1

It should match anything between and including [[ ]]. The main issue was dealing with newlines, and the (.|\n) part will match anything including newlines.

lintmouse
  • 5,079
  • 8
  • 38
  • 54