0

I want to double check this and believe this will be helpful for others. If someone uses htmlspecialchars($var) in their code and are running a PHP version prior to 5.4, then they're open to utf-7 XSS. That's a given. Am I correct in assuming that the site would still be open to utf-7 XSS even if the header content character set is utf-8, since the server content character set of PHP defaults to iso-8859-1?

Edit: I was asked what I hope to profit out of this. I hope to make sure a project isn't vulnerable to utf-7, since some programmers don't seem inclined to set the third parameter of htmlspecialchars, which is the character set. If you understand the server character set I mentioned and how that fits into utf-7, then I could really use your help.

  • What is your expected profit out of this question ? – Siva Tumma Jan 01 '14 at 05:13
  • The UTF-7 vulnerability exploits specific *characters*. Is your content served in UTF-7? If not, you likely have nothing to worry about – Pekka Jan 01 '14 at 05:23
  • 1
    @sivatumma I updated my question. Pekka, I wish that were the case, but it isn't. – user3148596 Jan 01 '14 at 06:11
  • 2
    I don't think the use of `htmlspecialchars` is the problem; it's just that its protection can be evaded when the browser interprets the document as UTF-7. From my understanding this only happens in particular versions of IE when no character set has been specified, so the fix should just be to remember to set your character set everywhere. – Waleed Khan Jan 01 '14 at 17:01

2 Answers2

5

Assuming that you are talking about outputting user controlled values to the page then if the HTTP header is set to UTF-8 like so

Content-Type: text/html; charset=utf-8

then XSS cannot be achieved using UTF-7 encodings.

SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
1

The charset parameter has no impact on UTF-7 attacks. The byte that has special powers in UTF-7 is 0x2B (ASCII +), and htmlspecialchars() never escapes that.

If you have a user string (in an ASCII-compatible encoding like, say, UTF-8), that you wanted to include on a web page that used the UTF-7 encoding, then you'd have to convert that string using iconv('utf-8', 'utf-7', $str) after calling htmlspecialchars on the UTF-8 string. This charset conversion is a separate operation to HTML-escaping.

In theory you could use htmlspecialchars($s, ENT_xxx, 'utf-7') to HTML-encode a string that was already in UTF-7 encoding, except that, unlike the iconv extension, the native-PHP htmlspecialchars function doesn't support UTF-7.

But the point is moot because modern browsers won't allow you to use UTF-7 and no-one ever deliberately authored a UTF-7 web page.

Real UTF-7 attacks happen not due to missing HTML-encoding, but because a browser treats a page as containing UTF-7 bytes when this was not intended. It's easy to stop that happening, by including an explicit charset declaration, either in the HTTP Content-Type header (as demonstrated by SilverlightFox, +1), or in a <meta> element included in the page before any user content.

bobince
  • 528,062
  • 107
  • 651
  • 834
  • This was not addressing utf-7 as the actual character set being used, it was addressing the issues in iso-8859-1 that allows certain utf-7 characters through. – user3148596 Jan 05 '14 at 23:11