1

I'd like to make sure URLs such as javascript:alert('a'); and vbscript varients etc. are not allowed by whitelisting https?|ftp That's easy enough: ^(?:https?|ftp):// But how can I allow relative urls as well? such as ../../../blah and ./blah also /images/img.png

In other words is using ^(?:(?:https?|ftp)://|[./]) safe?

I've asked around and a possible solution might be: parse_url

if !scheme or scheme == http or scheme == https or scheme == ftp or scheme == mailto

John
  • 55
  • 1
  • 4

2 Answers2

1

Instead of using regular expressions you could use parse_url and check that scheme is either empty or one of http, https, and ftp:

$components = parse_url($url);
if (!isset($url['scheme']) || in_array(strtolower($url['scheme']), array('http', 'https', 'ftp'))) {
    // valid
} else {
    // invalid
}
Gumbo
  • 643,351
  • 109
  • 780
  • 844
1

Also see: Sanitizing strings to make them URL and filename safe?

I'm trying to filter URLs such to go into <a href="" or <img src=""

Be careful, because it's possible to "break out" of the attribute with just a "starts with" regular expression. For instance, I could provide http://safeurl.com" onclick="alert('xss attack'), and when inserted into your attribute you would have:

<a href="http://safeurl.com" onclick="alert('xss attack')">

Make sure to urlencode() the value as well as any other security you're doing.

I would probably consider against allowing ../../relative/urls or perhaps using parse_url as Gumbo has suggested.

Check out the info on OWASP.org for some more advice.

Community
  • 1
  • 1
Wesley Murch
  • 101,186
  • 37
  • 194
  • 228
  • Thanks, I was going to use hrtmlentities to properly escape the data before putting it in the attributes. I'm not sure urlencode() should be used? – John May 03 '11 at 07:50
  • I would use urlencode() on each slashed segment to be safe. This is definitely worth writing a function for, something really solid and paranoid. – Wesley Murch May 03 '11 at 10:09