0

In an application I'm writing, at some point, I want to be able to parse a tag-like string with RegExp so that it is modified, such as:

"{b}This is BOLD{/b}".replace(/\{b\}(.*?)\{\/b\}/gi, "00b3f[ $1 00b3d]");

// Returns "00b3f[ This is BOLD 00b3d]"

I am able to do this easily, but it gets complicated when a more complex string is passed to the function, for example:

"{red} This is RED {red} This also should be red {/red} and this {/red}"
.replace(/\{red\}(.*?)\{\/red\}/gi, "00b4f[ $1 00b4d]");

// Returns:
// "00b4f[  This is RED {red} This also should be red  00b4d] and this {/red}"

// Where the output should be:
// "00b4f[  This is RED 00b4f[ This also should be red 00b4d] and this 00b4d]"

I would like to solve this problem with a simple RegExp but I can't find a way to do it! I think I could do this with a while-loop but it would get too messy. Any suggestions?

Art McBain
  • 400
  • 3
  • 8
Afonso Matos
  • 2,406
  • 1
  • 20
  • 30
  • 1
    You can't solve this problem with a regexp because the language you are describing is context-sensitive but not regular - in simple terms it means "you need memory to parse it" which is what you've observed. I recommend a simple recursive descent parser. – Benjamin Gruenbaum Dec 30 '14 at 20:38
  • Related: http://stackoverflow.com/q/1732348/1026459 – Travis J Dec 30 '14 at 20:43

1 Answers1

2

Regex can't deal with nested expressions (unless you have access to a powerful implementation of regex, which javascript doesn't), so a pure-regex solution is out of the question. But there's still an easy way to do this:

  1. replace all occurences of \{(\w+)\}((?:(?!\{\w+\}).)*)\{\/\1\} (this matches a {tag}...{/tag} pair, but only if it does not contain another {tag} ) with 00b4f[ $2 00b4d] .
  2. repeat until there are no more matches.

To make this dynamic, use a callback function for the replacement:

var tagPattern = /\{(\w+)\}((?:(?!\{\w+\}).)*)\{\/\1\}/g,
    tagReplacer = function ($0, $1, $2) {
        switch ($1) {
            case "b": return "00b3f[" + $2 + " 00b3d]";
            case "red": return "00b4f[" + $2 + " 00b4d]";
            default: return $2;
        }
    };

while (tagPattern.test(sourceString)) {
    sourceString = sourceString.replace(tagPattern, tagReplacer);
}
Tomalak
  • 332,285
  • 67
  • 532
  • 628
Aran-Fey
  • 39,665
  • 11
  • 104
  • 149