0

For a special value, I've tried the HTML <option value="&#0;">unspecified</option>, but it seems that the NUL character is not interpreted in HTML. I'm getting . I'd like to know why, and what other unusual UTF-8 characters besides NUL I may have to watch out for.

Here's a fiddle to demonstrate what I'm talking about.

<select><option value="&#0;">&#0;</option></select>

As you can see above, the dropdown is setup with NUL values, but they are converted to � when JavaScript inspects the results.

var select = document.querySelector('select')
inspect()
select.options[0].value = '\u0000'
select.options[0].label = '\u0000'
inspect()
select.innerHTML = select.innerHTML
inspect()
function inspect() {
  alert(encodeURIComponent(select.options[0].value)
        + ','
        + encodeURIComponent(select.options[0].label)
        + ','
        + select.innerHTML)
}

JavaScript can specifically set value and label to \u0000 and it works, but for some reason this is not able to be rendered in the HTML.

Can you explain why and/or point to the relevant documentation? Are there other UTF-8 characters that will be substituted in a similar manner?

700 Software
  • 85,281
  • 83
  • 234
  • 341
  • 2
    What did you expect it to be rendered as? – rollingBalls Oct 19 '15 at 22:53
  • `NULL` is mentioned in the second note under [8.2.2.5 Preprocessing the input stream](http://www.w3.org/TR/html5/syntax.html#preprocessing-the-input-stream) in W3C's HTML/XHTML Recommendation. It reads as "it depends" to me :P – Jongware Oct 19 '15 at 22:55
  • @rollingBalls, I'm expecting `%00,%00,` in all three alerts - where NUL is the browser-specific representation of a NUL character. – 700 Software Oct 19 '15 at 22:55
  • Why not use a known value, is this for an actual application or for a homework assignment? Empty string, 0, etc works fine in these cases. – Seano666 Oct 19 '15 at 22:59
  • Using a known value (and optional prefix of other values) is certainly the right choice in this case. While it's been a long time since I've been given homework, it is true that this question is not for a practical problem, but rather to gain a better understanding of UTF-8 and its relationship with HTML. – 700 Software Oct 19 '15 at 23:00

2 Answers2

3

There's a character reference override table in the HTML5 spec for the mapping of character references. The first of these is for &#0;

This is followed by some prose stating that numbers in the range 0xD800 to 0xDFFF or greater than 0x10FFFF are also mapped to the Unicode replacement character.

Alohci
  • 78,296
  • 16
  • 112
  • 156
0

NUL is invalid. HTML is a text based document. Only character strings may be entered.

https://developers.whatwg.org/elements.html#attributes

Except where otherwise specified, attributes on HTML elements may have any string value, including the empty string. Except where explicitly stated, there is no restriction on what text can be specified in such attributes.

Rob
  • 14,746
  • 28
  • 47
  • 65
  • Unless NUL was explicitly stated, I assume the quote means NUL is (or should be) permitted - I've always considered `"\u0000"` to qualify among the family of "any string value". – 700 Software Oct 19 '15 at 23:11