0

How can I consider the following string as an invalid URL path because it actually contains a hostname and not represents a valid path:

/<>//google.com

By using the following regex validation code:

preg_match(""/(?:[\w-]+\.)+[\w-]+/"", $url, $matches);

I currently receive true for both: /<>//google.com and /3.2/

while "/3.2/" is a legit URL path and not a domain name

Yair Nevet
  • 12,725
  • 14
  • 66
  • 108

2 Answers2

0

Perhaps you can use FILTER_VALIDATE_URL and pass the FILTER_FLAG_PATH_REQUIRED flag as well.

filter_var('http://host.com/path', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED);

https://www.php.net/manual/en/filter.filters.validate.php

https://www.php.net/manual/en/intro.filter.php

user1392897
  • 831
  • 2
  • 9
  • 25
0

Based on the answers here and here, I came up with something like this:

function is_valid_url_path($url_path) {
  return preg_match("#^\/*[a-z0-9+&@=~_|!:,.;-]*\/*(%[0-9]([a-f]|[0-9]]))*/*$#i", $url_path);
}

print is_valid_url_path('/3.2/'); // 1
print is_valid_url_path('//3.2/'); // 1
print is_valid_url_path('/3.2///%3F'); // 1
print is_valid_url_path('/3.2///%3'); // 0
print is_valid_url_path("/<>//google.com"); // 0

Things I've considered within the regex:

  • Allowed chars: a-z A-Z 0-9 . - _ ~ ! $ & ' ( ) * + , ; = : @

  • Percent-encoding: DIGIT + (DIGIT | (A|B|C|D|E|F)) (e.g %23, %3B)

  • A path contain multiple empty segments, e.g. ///hello//world

  • A path is terminated either with a ?, # or simply by the end of the URI

Also see RFC 3986, Sec. 3.3. Path.

Kenan Güler
  • 1,868
  • 5
  • 16