11

I want to be able to take user inputted text in a comment field and check for URL type expression, and if it exists, add an anchor tag (to url) when the comment is displayed.

I am using PHP on the server-side, and Javascript (with jQuery) on client, so should I wait to check for URL until right before it is displayed? Or add the anchor tag before inserting it in the database?

so

<textarea id="comment">check out blahblah.com or www.thisthing.co.uk or http://checkthis.us/</textarea>  

becomes

<div id="commentDisplay">check out <a href="blahblah.com">blahblah.com</a> or <a href="www.thisthing.co.uk">www.thisthing.co.uk</a> or <a href="http://checkthis.us/">http://checkthis.us/</a></div>
Benjamin Loison
  • 3,782
  • 4
  • 16
  • 33
Douglas
  • 1,238
  • 5
  • 15
  • 27
  • 3
    I understand what you're trying to achieve, but as your example is syntactically invalid, I'd just warn about that: you need to specify external URL's with a **protocol** (http://), otherwise they will become relative and point to your own domain! Thus, `http://blahblah.com` and so on. – BalusC Dec 24 '09 at 17:12
  • 1
    If you do that kind of manipulation before inserting the comment in the DB, you'll have a problem if someone wants to edit his post : there will be some HTML in the middle of it ;; so, either do that manipulation when displaying, or store 2 versions of the comment in the DB (one "clean", and one "transformed/enriched") – Pascal MARTIN Dec 24 '09 at 17:13
  • @BalusC you are right, I meant to change that in the displayed, but I got copy-and-paste happy and forgot. – Douglas Dec 24 '09 at 17:45

8 Answers8

22

First, a request. Don't do this before writing the data to the database. Instead, do it before displaying the data to the end-user. This will cut down on all confusion, and will allow you more flexibility in the future.

One example found online follows:

$text = preg_replace('@(https?://([-\w\.]+)+(:\d+)?(/([-\w/_\.]*(\?\S+)?)?)?)@', '<a href="$1">$1</a>', $text);

And a much more thorough one from daringfireball.net:

/**
 * Replace links in text with html links
 *
 * @param  string $text
 * @return string
 */
function auto_link_text($text)
{
   $pattern  = '#\b(([\w-]+://?|www[.])[^\s()<>]+(?:\([\w\d]+\)|([^[:punct:]\s]|/)))#';
   $callback = create_function('$matches', '
       $url       = array_shift($matches);
       $url_parts = parse_url($url);

       $text = parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
       $text = preg_replace("/^www./", "", $text);

       $last = -(strlen(strrchr($text, "/"))) + 1;
       if ($last < 0) {
           $text = substr($text, 0, $last) . "&hellip;";
       }

       return sprintf(\'<a rel="nowfollow" href="%s">%s</a>\', $url, $text);
   ');

   return preg_replace_callback($pattern, $callback, $text);
}
Sampson
  • 265,109
  • 74
  • 539
  • 565
  • this would work, although I am afraid of possible lag (it will be displaying lots of comments at a time) – Douglas Dec 24 '09 at 18:04
  • Give it a shot. I doubt you'll experience any noticeable lag. – Sampson Dec 24 '09 at 18:38
  • that's really neat (second function) – Alex Coplan Sep 20 '11 at 10:24
  • First function doesn't work with with dashes in the URLs, as in this very page's URL: http://stackoverflow.com/questions/1959062/how-to-add-anchor-tag-to-a-url-from-text-input. – Pat Zabawa Jan 06 '14 at 17:08
  • 2
    @JonathanSampson when I used the second function with just a domain name without 'http://' it is appending the current domain name to the begining. ex: if I type in www.google.com it is showing as www.mydomain.com/www.google.com – user1846348 Feb 03 '14 at 22:46
13

I adapted Jonathan Sampson's regex option so that it is more lenient about what is a domain (doesn't need http(s) to qualify).

function hyperlinksAnchored($text) {
    return preg_replace('@(http)?(s)?(://)?(([-\w]+\.)+([^\s]+)+[^,.\s])@', '<a href="http$2://$4">$1$2$3$4</a>', $text);
}

Works for these URLs (and successfully leaves out trailing period or comma):

http://www.google.com/
https://www.google.com/.
www.google.com
www.google.com.
www.google.com/test
google.com
google.com,
google.com/test
123.com/test
www.123.com.au
ex-ample.com
http://ex-ample.com
http://ex-ample.com/test-url_chars.php?param1=val1.
http://ex-ample.com/test-url_chars?param1=value1&param2=val+with%20spaces

Hope that helps someone.

markd
  • 1,526
  • 14
  • 14
  • 1
    I've been looking an answer everywhere that works for all those different cases are shown in your example. Thanks for taking the time to share this with the community! Great job! – zeckdude Jul 06 '13 at 09:09
  • 1
    @user1846348 is correct. It also won't work for a domain like httpfun.com – Dex Mar 19 '14 at 07:41
  • This should be adapted so that `(http)?(s)?(://)?` becomes `(https?://)?` -- I'm pretty sure that would solve the previously mentioned problems. (Still need to update the $1, $2, etc.) – Nathan J.B. Apr 26 '15 at 04:06
  • 1
    This also currently captures numbers, like 399.99 – Nathan J.B. Apr 26 '15 at 04:17
3

Here is my code to format all the links inside text including emails, urls with and without protocol.

public function formatLinksInText($text)
{
    //Catch all links with protocol      
    $reg = '/(http|https|ftp|ftps)\:\/\/[a-zA-Z0-9\-\.]+\.[a-zA-Z]{2,}(\/\S*)?/'; 
    $formatText = preg_replace($reg, '<a href="$0" style="font-weight: normal;" target="_blank" title="$0">$0</a>', $text);

    //Catch all links without protocol
    $reg2 = '/(?<=\s|\A)([0-9a-zA-Z\-\.]+\.[a-zA-Z0-9\/]{2,})(?=\s|$|\,|\.)/';
    $formatText = preg_replace($reg2, '<a href="//$0" style="font-weight: normal;" target="_blank" title="$0">$0</a>', $formatText);

    //Catch all emails
    $emailRegex = '/(\S+\@\S+\.\S+)\b/';
    $formatText = preg_replace($emailRegex, '<a href="mailto:$1" style="font-weight: normal;" target="_blank" title="$1">$1</a>', $formatText);
    $formatText = nl2br($formatText);
    return $formatText;
}
Hoang Trung
  • 1,979
  • 1
  • 21
  • 33
2

Refining Markd's answer to avoid links on decimals, percentages, numerical dates (10.3.2001), ellipsis and IP addresses:

    function addLinks($text) {
    return preg_replace('@(http)?(s)?(://)?(([a-zA-Z])([-\w]+\.)+([^\s\.]+[^\s]*)+[^,.\s])@', '<a target="ref" href="http$2://$4">$1$2$3$4</a>', $text);
}

Works for:

http://www.google.com/
https://www.google.com/.
www.google.com
www.google.com.
www.google.com/test
google.com
google.com,
google.com/test
www.123.com.au
ex-ample.com
http://ex-ample.com
http://ex-ample.com/test-url_chars.php?param1=val1.
http://ex-ample.com/test-url_chars?param1=value1&param2=val+with%20spaces

Does not work for:

123.com/test (numeric domains without 'www')
Keep it up press of popular opinion........keep the average (ellipsis)
Rising 3.8% to 3.94 million from 3.79 million (percentages and decimals)
Edited by Andrew Brooke - 07.08.2013 19:57 (dd.mm.yyyy dates)
10.1.1.1 (IP Addresses)

Dex
  • 31
  • 3
1

Personally, I would mark it up with JS right before displaying, seems more professional and sustainable than editing the user's comment yourself.

Gal
  • 23,122
  • 32
  • 97
  • 118
1

I would rather do that in the server side. Javascript has a "lag"; it runs only when the entire HTML DOM tree is been loaded and displayed in the webbrowser. Thus it may take a (although short) while before the URL's are recognized and parsed. The client may see the links instantly been replaced while he is still facing the content. This might lead to "wtf?" experiences at the client side. This is nowadays too quickly related to advertisting/spam/spyware. You should avoid that as much as possible. Don't use JS to change the content onload, rather do it only during user-controlled events (onclick, onchange, onfocus, etc). Use the server side language to change content prior to save or display.

So, just look for a PHP script which parses the text (or uses regex) to construct fullworthy links based on URL's in plain text. You can find a lot here. Good luck.

BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • i agree about the WTF statement and the lag, although i might have to adjust the DB column holding the comment to hold more characters to take into account those added by the PHP – Douglas Dec 24 '09 at 17:48
0

Simply suggest a useful plugin here: External Links https://wordpress.org/plugins/sem-external-links/

Till
  • 1,097
  • 13
  • 13
0

I've got a little update to the accepted answer that also works for links without protocol (links without http(s)://) - before they were linked but as relative links which wasn't working.

I also added some comments for documentation.

/**
 * Replace links in text with html links
 *
 * @param  string $text Text to add links to
 * @return string Text with links added
 */
function auto_link_text( $text )
{
    $pattern = "#\b((?:https?:(?:/{1,3}|[a-z0-9%])|[a-z0-9.\-]+[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)/)(?:[^\s()<>{}\[\]]+|\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\))+(?:\([^\s()]*?\([^\s()]+\)[^\s()]*?\)|\([^\s]+?\)|[^\s`!()\[\]{};:'.,<>?«»“”‘’])|(?:(?<!@)[a-z0-9]+(?:[.\-][a-z0-9]+)*[.](?:com|net|org|edu|gov|mil|aero|asia|biz|cat|coop|info|int|jobs|mobi|museum|name|post|pro|tel|travel|xxx|ac|ad|ae|af|ag|ai|al|am|an|ao|aq|ar|as|at|au|aw|ax|az|ba|bb|bd|be|bf|bg|bh|bi|bj|bm|bn|bo|br|bs|bt|bv|bw|by|bz|ca|cc|cd|cf|cg|ch|ci|ck|cl|cm|cn|co|cr|cs|cu|cv|cx|cy|cz|dd|de|dj|dk|dm|do|dz|ec|ee|eg|eh|er|es|et|eu|fi|fj|fk|fm|fo|fr|ga|gb|gd|ge|gf|gg|gh|gi|gl|gm|gn|gp|gq|gr|gs|gt|gu|gw|gy|hk|hm|hn|hr|ht|hu|id|ie|il|im|in|io|iq|ir|is|it|je|jm|jo|jp|ke|kg|kh|ki|km|kn|kp|kr|kw|ky|kz|la|lb|lc|li|lk|lr|ls|lt|lu|lv|ly|ma|mc|md|me|mg|mh|mk|ml|mm|mn|mo|mp|mq|mr|ms|mt|mu|mv|mw|mx|my|mz|na|nc|ne|nf|ng|ni|nl|no|np|nr|nu|nz|om|pa|pe|pf|pg|ph|pk|pl|pm|pn|pr|ps|pt|pw|py|qa|re|ro|rs|ru|rw|sa|sb|sc|sd|se|sg|sh|si|sj|Ja|sk|sl|sm|sn|so|sr|ss|st|su|sv|sx|sy|sz|tc|td|tf|tg|th|tj|tk|tl|tm|tn|to|tp|tr|tt|tv|tw|tz|ua|ug|uk|us|uy|uz|va|vc|ve|vg|vi|vn|vu|wf|ws|ye|yt|yu|za|zm|zw)\b/?(?!@)))#";
    return preg_replace_callback( $pattern, function( $matches ) {
        $url = array_shift( $matches );

        // force http if no protocol included
        if ( !startsWith( $url, 'http' ) ) {
            $url = 'http://' . $url;
        }

        // make link text from url - removing protocol
        $text = parse_url( $url, PHP_URL_HOST ) . parse_url( $url, PHP_URL_PATH );
        
        // remove the www from the link text
        $text = preg_replace( "/^www./", "", $text );

        // remove any long trailing path from url
        $last = -( strlen( strrchr( $text, "/" ) ) ) + 1;
        if ( $last < 0 ) {
            $text = substr( $text, 0, $last ) . "&hellip;";
        }

        // update 
        return sprintf(
            '<a rel="nowfollow" target="_blank" href="%s">%s</a>', 
            $url, 
            $text
        );
    }, $text );
}

/**
 * Check strings for starting match
 *
 * @param  string $string String to check.
 * @param  string $startString Startin string to match.
 * @return boolean Wether string begins with startString. 
 */
function startsWith( $string, $startString ) 
{ 
    $len = strlen($startString); 
    return (substr($string, 0, $len) === $startString); 
}
circlecube
  • 706
  • 1
  • 10
  • 19