10

The five characters that OWASP recommend escape to prevent XSS injections are &, <, >, ", '.

Among them, I cannot understand why &(ampersand) should be escaped and how it can be used as a vector to inject script. Can somebody give an example that all the other four characters that are escaped but ampersand is not so there will be XSS injection vulnerability.

I have checked the other question but that answer really does not make things any clearer.

LordWilmore
  • 2,829
  • 2
  • 25
  • 30
Jinxin Ni
  • 331
  • 3
  • 10

1 Answers1

9

The answer here addresses the issue only in a nested JavaScript context within an HTML attribute context, whereas your question asks specifically about pure HTML context escaping.

In that question, the escaping should be as per the OWASP recommendation for JavaScript:

Except for alphanumeric characters, escape all characters with the \uXXXX unicode escaping format (X = Integer).

Which will already handle & because it is not alphanumeric.

To answer you question, from a practical point of view, why wouldn't you escape ampersand?

The HTML representation of & is &amp;, so it makes a lot of sense to do that. If you didn't, anytime a user entered &amp, &lt, or &gt into your application, your application would render &, <, or > instead of &amp, &lt or &gt.

An edge case? Definitely. A security concern? It shouldn't be.

From the HTML5 syntax Character references section:

Character references must start with a U+0026 AMPERSAND character (&). Following this, there are three possible kinds of character references:

  • Named character references
  • Decimal numeric character reference
  • Hexadecimal numeric character reference

When an & is encountered:

Switch to the data state.

Attempt to consume a character reference, with no additional allowed character.

If nothing is returned, emit a U+0026 AMPERSAND character (&) token.

Otherwise, emit the character tokens that were returned.

Therefore, anything after the & will cause either & to be output, or the character represented. As the following characters have to be alphanumeric or else they won't be consumed, there is no chance of an escape character (e.g. ', ", >, <) being consumed and ignored, therefore there is little security risk of an attacker changing the parsing context. However, you never know if there is a browser bug that doesn't quite follow the standard properly, therefore I would always escape &. Internet Explorer had an issue where you could specify <% and it would be interpreted as < allowing the .NET Request Validation from being bypassed for XSS attack vectors. Always better to be safe than sorry.

SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
  • 1
    Thank you for your answer. That solved my confusion. I believe it is better to document the explanation as well on OWASP instead of just telling you what should be escaped without giving an example. Some people take it without thinking but people really think about every chars that will have the same question. I have seen another same question on stackoverflow but that does not have an answer. – Jinxin Ni Sep 01 '16 at 15:03