Replace www.example.com w/ www.example.com

Question

I am not very good at regex, but I need to convert the following example from this

<li>Creations by Carol - www.driedfloralcreations.com</li>

to

<li>Creations by Carol - <a href="http://www.driedfloralcreations.com" rel="external">www.driedfloralcreations.com</a></li>

what language are you using to do this? This can't be accomplished just in HTML. — GSto, Oct 13 '09 at 20:29
I am doing this in my TextMate search & replace part, sorry I did not mention that earlier. — Brad, Oct 13 '09 at 20:40
Just take the pattern from my script then: www\.[a-z\d-\.]+\.[a-z]+ — David Snabel-Caunt, Oct 13 '09 at 20:45
I am terribly sorry, but reading the question and some of the answers and comments has led me to believe, that some people think, that "www" as the leftmost label of a domain name might mean something special. Why? — shylent, Dec 26 '09 at 14:08

David Snabel-Caunt · Answer 1 · 2009-10-13T20:40:57.710

2

How about this in PHP?

$string = '<li>Creations by Carol - www.driedfloralcreations.com</li>';
$pattern = '/(www\.[a-z\d-\.]+\.[a-z]+)/i';
$replacement = '<a href="http://$1" rel="external">$1</a>';
echo preg_replace($pattern, $replacement, $string);

Assumes your links are always www.something.extension.

edited Oct 13 '09 at 20:40

answered Oct 13 '09 at 20:32

David Snabel-Caunt

57,804
13
114
132

You forgot uppercase and symbols, and you did not escaped the last dot, and the last part of the url is not _really_ a [a-z]+, but rather a list of choices. – NewbiZ Oct 13 '09 at 20:35
4

The i after the closing slash denotes an insensitive match. I've added the missing backslash. I made the assumption that Brad doesn't want to enumerate hundreds of TLDs and his users will enter valid domains. He didn't ask for an exhaustive or highly complex solution so I wrote a simple regex. – David Snabel-Caunt Oct 13 '09 at 20:43
1

It's for a regex replacement in a text editor - quick and dirty is desireable. However, some editors have non-convential implementations that may differ from PHPs. Can someone confirm this will work in TM? – Samantha Branham Oct 13 '09 at 20:55
This will work in TextMate with only one modification: the `-` in the character class needs to be escaped. Also, the case sensitivity flag is a checkbox. So, regex as follows: `(www\.[a-z\d\-\.]+\.[a-z]+)` – Emily Oct 13 '09 at 21:37

Emily · Accepted Answer · 2009-10-13T21:30:55.773

If you're only looking for URLs in <li> elements formatted like the one in your question, it should be much simpler than a lot of the other suggested solutions. You don't really need to validate your URLs, I assume, you just want to take a list of site names and URLs and turn the URLs into links.

Your search pattern could be:

<li>(.+) - (https?:\/\/)?(\S+?)<\/li>

And the replace pattern would be:

<li>$1 - <a href="(?2:$2:http\://)$3" rel="external">$3</a></li>

Just tested the find/replace out in TextMate and it worked nicely. It addes http:// if it isn't already present, and otherwise assumes that whatever is after the - is a URL as long as it doesn't contain a space.

For testing out regular expressions, Rubular is a great tool. You can paste in some text, and it'll show you what matches as you type your regex. It's a ruby tool, but TextMate uses the same regex syntax as ruby.

This looks good, but i think the S+ match should be non-greedy just in case there is another
Good suggestion, I didn't think of that. I've changed it. (Sorry to have misattributed the suggestion in the edit comments, though. That's what I get for copy/pasting too fast) — Emily, Oct 13 '09 at 21:32

score 1 · Answer 3 · answered Oct 13 '09 at 21:04

You have to be really clear about how much information you need to give the regex to avoid false positives.

For example is the pattern www.something.somethingelse enough? are there other www in the file that would get caught?

maybe <li> something - somethingelse</li> is the correct match. We cannot guess without knowing your whole file. There might be other <li> in there that you don't want to change.

score -2 · Answer 4 · answered Oct 13 '09 at 20:32

-2

www\.[a-zA-Z0-9_-]+\.(fr|com|org|be|biz|info|getthelistsomewhere)

answered Oct 13 '09 at 20:32

NewbiZ

2,395
2
26
40

Replace www.example.com w/ www.example.com

4 Answers4

Linked

Related