3

I need a regular expression to strip out any BBCode in a string. I've got the following (and an array with tags):

new RegExp('\\[' + tags[index] + '](.*?)\\[/' + tags[index] + ']');

It picks up [tag]this[/tag] just fine, but fails when using [url=http://google.com]this[/url].

What do I need to change? Thanks a lot.

Luca Filosofi
  • 30,905
  • 9
  • 70
  • 77

8 Answers8

6

I came across this thread and found it helpful to get me on the right track, but here's an ultimate one I spent two hours building (it's my first RegEx!) for JavaScript and tested to work very well for crazy nests and even incorrectly nested strings, it just works!:

string = string.replace(/\[\/?(?:b|i|u|url|quote|code|img|color|size)*?.*?\]/img, '');

If string = "[b][color=blue][url=www.google.com]Google[/url][/color][/b]" then the new string will be "Google". Amazing.

Hope someone finds that useful, this was a top match for 'JavaScript RegEx strip BBCode' in Google ;)

1

I had a similar problem - in PHP not Javascript - I had to strip out BBCode [quote] tags and also the quotes within the tags. Added problem in that there is often arbitrary additional stuff inside the [quote] tag, e.g. [quote:7e3af94210="username"]

This worked for me:

$post = preg_replace('/[\r\n]+/', "\n", $post);
$post = preg_replace('/\[\s*quote.*\][^[]*\[\s*\/quote.*\]/im', '', $post);
$post = trim($post);

lines 1 and 3 are just to tidy up any extra newlines, and any that are left over as a result of the regex.

Coder
  • 2,833
  • 2
  • 22
  • 24
1

You have to allow any character other than ']' after a tag until you find ' ]'.

new RegExp('\\[' + tags[index] + '[^]]*](.*?)\\[/' + tags[index] + ']');

You could simplify this to the following expression.

\[[^]]*]([^[]*)\[\\[^]]*]

The problem with that is, that it will match [WrongTag]stuff[\WrongTag], too. Matching nested tags requires using the expression multiple times.

Daniel Brückner
  • 59,031
  • 16
  • 99
  • 143
1

To strip out any BBCode, use something like:

string alltags = tags.Join("|");
RegExp stripbb = new RegExp('\\[/?(' + alltags + ')[^]]*\\]');

Replace globally with the empty string. No extra loop necessary.

Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • [^\\\]] does not match characters other than ']' but characters other than '\' followed by ']' because you must not escape ']' in the first position. Correct is [^]]. – Daniel Brückner May 11 '09 at 13:12
  • There is no "followed by" in a character class. If anything, the character class matches everything except "\" and "]". I'll take out the surplus backslash. – Tomalak May 11 '09 at 14:11
1

You can check for balanced tags using a backreference:

 new RegExp('\\[(' + tags.Join('|') + ')[^]]*](.*?)\\[/\\1]');

The real problem is that you cant't match arbitrary nested tags in a regular expression (that's the limit of a regular language). Some languages do allow for recursive regular expressions, but those are extensions (that technically make them non-regular, but doesn't change the name that most people use for the objects).

If you don't care about balanced tags, you can just strip out any tag you find:

 new RegExp('\\[/?(?:' + tags.Join('|') + ')[^]]*]');
rampion
  • 87,131
  • 49
  • 199
  • 315
0

Remember that many (most?) regex flavours by default do not let the DOT meta character match line terminators. Causing a tag like

"[foo]dsdfs
fdsfsd[/foo]"

to fail. Either enable DOTALL by adding "(?s)" to your regex, or replace the DOT meta char in your regex by the character class [\S\s].

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
0

this worked for me, for every tag name. it also supports strings like '[url="blablabla"][/url]'

str = str.replace(/\[([a-z]+)(\=[\w\d\.\,\\\/\"\'\#\,\-]*)*( *[a-z0-9]+\=.+)*\](.*?)\[\/\1\]/gi, "$4")
Manu
  • 99
  • 8
0

I think

new RegExp('\\[' + tags[index] + '(=[^\\]]+)?](.*?)\\[/' + tags[index] + ']');

should do it. Instead of group 1 you have to pick group 2 then.

rudolfson
  • 4,096
  • 1
  • 22
  • 18
  • [^\\\]] does not match characters other than ']' but characters other than '\' followed by ']' because you must not escape ']' in the first position. Correct is [^]]. – Daniel Brückner May 11 '09 at 13:15