56

I can't believe I can't find this information easily accessible, so:

1) Which characters cannot be incorporated in an XML attribute without entity-encoding them?

Obviously, you need to encode quotes. What about < and >? What else?

2) Where exactly is the official list?

Euro Micelli
  • 33,285
  • 8
  • 51
  • 70

3 Answers3

60

Here is the definition of what is allowed in an attribute value.

'"' ([^<&"] | Reference)* '"'  |  "'" ([^<&'] | Reference)* "'" 

So, you can't have:

  • the same character that opens/closes the attribute value (either ' or ")
  • a naked ampersand (& must be &amp;)
  • a left angle bracket (< must be &lt;)

You should also not being using any characters that are outright not legal anywhere in an XML document (such as form feeds, etc).

Phrogz
  • 296,393
  • 112
  • 651
  • 745
great_llama
  • 11,481
  • 4
  • 34
  • 29
  • It is good to avoid the two-character string `--` as well. Otherwise, you run into trouble if you want to wrap that content in a comment. (Since `--` is not legal inside a comment.) – alex.jordan Sep 10 '19 at 05:22
6

As per the (2) current recommendation, specifically regarding character data and Markup, they are (1) the ampersand (&), left angle bracket (<), right angle bracket (>) and both single-quote (') and double-quote (").

codehead
  • 2,077
  • 16
  • 20
  • 1
    I agree on the section of the spec document. However, not all of those attributes "must" be escaped. Can you edit to clarify? – Euro Micelli May 15 '09 at 02:35
  • 2
    -1 There is no requirement to escape `>`, nor the `'` or `"` (that is not being used to delimit the attribute. – Phrogz Jan 22 '13 at 19:19
1

See 2.2 Characters in "Extensible Markup Language (XML) 1.0 (Third Edition)".

Note that, at least with .NET, if you are using the XML APIs to work with XML, then you won't have to worry about this. It's the reason not to treat XML as being text.

John Saunders
  • 160,644
  • 26
  • 247
  • 397
  • I agree on the document location, but I don't think that that specific section is the correct place to look at. That section lists the valid characters allowed in the "text stream", if you will. About .NET and libraries, I couldn't agree more -- but in this particular case I need to edit an existing text file that contains XML. – Euro Micelli May 15 '09 at 02:32
  • So, why not use the XML APIs to process that text file? – John Saunders May 15 '09 at 03:42
  • Better is to use the current version of the documents: https://www.w3.org/TR/xml/ – albert Mar 03 '18 at 19:15