I use the following Regex to find and use URLs within a string:
$regex = '/(?:www\.|https?:\/\/)?[a-z0-9]+\.[a-z0-9]{2,4}\S*\b/';
$string = preg_replace_callback($regex, function($matches) {
// ... use as $matches[0]
}, $string);
My problem starts with prices: for example, this Regex matches 3.99
from the string Only 3.99$ today!
. That should not happen, and I have researched and found out that finding and validating URLs is not a simple task.
Despite that, if there is no all-numeric TLD, I can just remove the 0-9
from the capturing set of the extension. But I couldn't find any claim that TLDs can't contain numbers only, so this question remains open.
Is it okay to drop the 0-9
range (in the long term usage)?