0

I use the following Regex to find and use URLs within a string:

$regex = '/(?:www\.|https?:\/\/)?[a-z0-9]+\.[a-z0-9]{2,4}\S*\b/';

$string = preg_replace_callback($regex, function($matches) {
    // ... use as $matches[0]
}, $string);

My problem starts with prices: for example, this Regex matches 3.99 from the string Only 3.99$ today!. That should not happen, and I have researched and found out that finding and validating URLs is not a simple task.

Despite that, if there is no all-numeric TLD, I can just remove the 0-9 from the capturing set of the extension. But I couldn't find any claim that TLDs can't contain numbers only, so this question remains open.

Is it okay to drop the 0-9 range (in the long term usage)?

Itay Ganor
  • 3,965
  • 3
  • 25
  • 40
  • 2
    What does the RFC that defines the domain naming system say about digits in the TLD? (Note, in the public internet there are *currently* no TLDs using digits, this does not mean they are not allowed.) – Richard Mar 27 '18 at 10:07
  • Accoding to [IANA DB](https://www.iana.org/domains/root/db), actually there're no TLD with digits. – Toto Mar 27 '18 at 10:19

0 Answers0