0

I am not very good at regex, but I need to convert the following example from this

<li>Creations by Carol - www.driedfloralcreations.com</li>

to

<li>Creations by Carol - <a href="http://www.driedfloralcreations.com" rel="external">www.driedfloralcreations.com</a></li>
Peter Boughton
  • 110,170
  • 32
  • 120
  • 176
Brad
  • 12,054
  • 44
  • 118
  • 187
  • 4
    what language are you using to do this? This can't be accomplished just in HTML. – GSto Oct 13 '09 at 20:29
  • I am doing this in my TextMate search & replace part, sorry I did not mention that earlier. – Brad Oct 13 '09 at 20:40
  • 1
    Just take the pattern from my script then: www\.[a-z\d-\.]+\.[a-z]+ – David Snabel-Caunt Oct 13 '09 at 20:45
  • I am terribly sorry, but reading the question and some of the answers and comments has led me to believe, that some people think, that "www" as the leftmost label of a domain name might mean something special. Why? – shylent Dec 26 '09 at 14:08

4 Answers4

2

How about this in PHP?

$string = '<li>Creations by Carol - www.driedfloralcreations.com</li>';
$pattern = '/(www\.[a-z\d-\.]+\.[a-z]+)/i';
$replacement = '<a href="http://$1" rel="external">$1</a>';
echo preg_replace($pattern, $replacement, $string);

Assumes your links are always www.something.extension.

David Snabel-Caunt
  • 57,804
  • 13
  • 114
  • 132
  • You forgot uppercase and symbols, and you did not escaped the last dot, and the last part of the url is not _really_ a [a-z]+, but rather a list of choices. – NewbiZ Oct 13 '09 at 20:35
  • 4
    The i after the closing slash denotes an insensitive match. I've added the missing backslash. I made the assumption that Brad doesn't want to enumerate hundreds of TLDs and his users will enter valid domains. He didn't ask for an exhaustive or highly complex solution so I wrote a simple regex. – David Snabel-Caunt Oct 13 '09 at 20:43
  • 1
    It's for a regex replacement in a text editor - quick and dirty is desireable. However, some editors have non-convential implementations that may differ from PHPs. Can someone confirm this will work in TM? – Samantha Branham Oct 13 '09 at 20:55
  • This will work in TextMate with only one modification: the `-` in the character class needs to be escaped. Also, the case sensitivity flag is a checkbox. So, regex as follows: `(www\.[a-z\d\-\.]+\.[a-z]+)` – Emily Oct 13 '09 at 21:37
2

If you're only looking for URLs in <li> elements formatted like the one in your question, it should be much simpler than a lot of the other suggested solutions. You don't really need to validate your URLs, I assume, you just want to take a list of site names and URLs and turn the URLs into links.

Your search pattern could be:

<li>(.+) - (https?:\/\/)?(\S+?)<\/li>

And the replace pattern would be:

<li>$1 - <a href="(?2:$2:http\://)$3" rel="external">$3</a></li>

Just tested the find/replace out in TextMate and it worked nicely. It addes http:// if it isn't already present, and otherwise assumes that whatever is after the - is a URL as long as it doesn't contain a space.

For testing out regular expressions, Rubular is a great tool. You can paste in some text, and it'll show you what matches as you type your regex. It's a ruby tool, but TextMate uses the same regex syntax as ruby.

Emily
  • 17,813
  • 3
  • 43
  • 47
  • 1
    This looks good, but i think the S+ match should be non-greedy just in case there is another
  • withoutspaces<\li> following.
  • – John La Rooy Oct 13 '09 at 21:23
  • Good suggestion, I didn't think of that. I've changed it. (Sorry to have misattributed the suggestion in the edit comments, though. That's what I get for copy/pasting too fast) – Emily Oct 13 '09 at 21:32