3

im making a comment section on a website. At first I needed to do a regular expression that finds any url and replace it surrounded with

<a href="url"></a>  

So I found a super regular expression to find all the url's in a comment and I did a function that returns all the urls with the html tag:

function addURLTags($string) {
    $pattern = "/(?i)\b((?:https?:\/\/|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}\/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))/";
    return preg_replace($pattern, '<a href="$1">$1</a>', $string);
}

Everything went excellent. but one week ago my boss told me that now I have to add bbcode to the comment section. And I was like "no problem"... but then he told me that my function addURLTags has to stay.

So any string like this:

http://www.google.com
[url]http://www.google.com[/url]
[url="http://www.google.com"]http://www.google.com[/url]

must be replaced to the same string:

<a href="http://www.google.com">http://www.google.com</a>

So I got a little php library that replaces all bbcode ocurrences to html code.

And I thought: "Ok, first I should get all url ocurrences that do not have a [url] tag in the beggining! And second I replace all the bbcode tags"

And I tried to add a negative assertion at the beggining of the super regex, something like this:

/(?i)\b((?![url])(?:https?://|www\d{0,3}[.]|[a-z0-9.-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|(([^\s()<>]+|(([^\s()<>]+)))))+(?:(([^\s()<>]+|(([^\s()<>]+))))|[^\s`!()[]{};:'\".,<>?«»“”‘’]))/

but didnt work!

Im kinda newbie with regular expressions and all the online testers I tried do not work well with such a long regex. I dont know what else try.

Do you have any suggestion? Do you know any PHP lybrary that does the "url replacing" with and without the [url] bbcode tags?

Thank you in advance.

Tomás
  • 3,501
  • 3
  • 21
  • 38

2 Answers2

1

You solve two problems here. So solve them separately and don't quench everything into a single regular expression which is more or less making things more complicated instead of less.

Divide and Conquer:

First use your bbcode library to locate the parts where those urls are, so that you can create a stream of your text:

"normal text", "bbcode", "normal text", "bbcode"

Then you apply the bbcode library to create the URLs only on the "bbcode" segments, and your URL clickable-maker will be applied to the "normal text" segments only.

After all segments have been processed, you concatenate all back into one string.

Voila, problem solved.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • I was focusing so much on do it my way, that I couldnt look at the whole possibilities. I splited the comment string and save all bbcode's strings. then I send all these to the bbcodelibrary. The rest of the strings worked ok with the super regex. perfect. thank you so much! – Tomás Apr 14 '12 at 20:47
0

It's better to parse the [url] BBCodes first, and then make any bare URLs into links. This is easily achieved by using a negative lookbehind to ensure there is not a double-quote before the URL. This works because you should already have converted quotes in the original string to &quot;, so any actual quotes before a URL must have been put there as part of your link creator.

Niet the Dark Absol
  • 320,036
  • 81
  • 464
  • 592