1

I have a big text inside a var on php, im looking for a good and fast method to retrive all the links inside this text and store them into an array.

The text is plain ascii and the links are the common ones like http://thesite.com or http://www.thesite.com. Thanks for any help.

DomingoSL
  • 14,920
  • 24
  • 99
  • 173

4 Answers4

3
$text = 'Lorem ipsum http://thesite.com dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt https://www.thesite.com ut labore et dolore magna aliqua. Ut http://www.thesite.com enim ad minim veniam,';

$pattern = '!(https?://[^\s]+)!'; // refine this for better/more specific results

if (preg_match_all($pattern, $text, $matches)) {
    list(, $links) = ($matches);
    print_r($links);
}
Yoshi
  • 54,081
  • 14
  • 89
  • 103
1

Search google for any "URL Regex", then insert it into the following code:

preg_match_all("/your url regex here/",$text,$matches);

all matches are now stored as an array in $matches[0].

Wulf
  • 3,878
  • 2
  • 22
  • 36
1

Well these regexes here are all nice and so, however, they grow over time and in the end, things might look like a little bit different. It's not all my credit nor is it all ideal, this one is with code from a community project having a some years on it's back and I don't want to say it's ideal, however it suits some needs. Compiled it up into a single function:

echo make_clickable('test http://www.google.com/');

/**
 * make_clickable
 * 
 * make a text clickable
 * 
 * @param string $text to make clickable
 * @param callback $url callback to process URLs
 * @return string clickable text
 * @author hakre and contributors
 * @license GPL
 */
function make_clickable($text, $url = null) {
    if (null === $url)
        $callback_url = function($url) {return $url;};
    else
        $callback_url = $url;
    $ret = ' ' . $text;
    // urls
    $save = ini_set('pcre.recursion_limit', 10000);
    $retval = preg_replace_callback('#(?<!=[\'"])(?<=[*\')+.,;:!&$\s>])(\()?([\w]+?://(?:[\w\\x80-\\xff\#%~/?@\[\]-]{1,2000}|[\'*(+.,;:!=&$](?![\b\)]|(\))?([\s]|$))|(?(1)\)(?![\s<.,;:]|$)|\)))+)#is', function($matches) use ($callback_url)
    {
        $url = $matches[2];
        $suffix = '';

        /** Include parentheses in the URL only if paired **/
        while ( substr_count( $url, '(' ) < substr_count( $url, ')' ) ) {
            $suffix = strrchr( $url, ')' ) . $suffix;
            $url = substr( $url, 0, strrpos( $url, ')' ) );
        }

        $url = $callback_url($url);
        if ( empty($url) )
            return $matches[0];

        return $matches[1] . "<a href=\"$url\">$url</a>" . $suffix;
    }, $ret);
    if (null !== $retval )
        $ret = $retval;
    ini_set('pcre.recursion_limit', $save);
    // web ftp
    $ret = preg_replace_callback('#([\s>])((www|ftp)\.[\w\\x80-\\xff\#$%&~/.\-;:=,?@\[\]+]+)#is', function ($matches) use ($callback_url)
    {
        $ret = '';
        $dest = $matches[2];
        $dest = 'http://' . $dest;
        $dest = $callback_url($dest);
        if ( empty($dest) )
            return $matches[0];

        // removed trailing [.,;:)] from URL
        if ( in_array( substr($dest, -1), array('.', ',', ';', ':', ')') ) === true ) {
            $ret = substr($dest, -1);
            $dest = substr($dest, 0, strlen($dest)-1);
        }
        return $matches[1] . "<a href=\"$dest\">$dest</a>$ret";
    }, $ret);
    // email
    $ret = preg_replace_callback('#([\s>])([.0-9a-z_+-]+)@(([0-9a-z-]+\.)+[0-9a-z]{2,})#i', function($matches)
    {
        $email = $matches[2] . '@' . $matches[3];
        return $matches[1] . "<a href=\"mailto:$email\">$email</a>";
    }, $ret);
    $ret = preg_replace("#(<a( [^>]+?>|>))<a [^>]+?>([^>]+?)</a></a>#i", "$1$3</a>", $ret);
    $ret = trim($ret);
    return $ret;
}
hakre
  • 193,403
  • 52
  • 435
  • 836
0

You have to use regular expressions. preg and ereg are both interesting in PHP, considering that ereg is easier to use, but slower.

Here is a simple preg call that will get URLs from $text.

preg_match_all("/https?:\/\/[^\s]+/i", $text, $urls);

$urls is an array of your URLs.

FMCorz
  • 2,586
  • 1
  • 21
  • 18