3

My question is very similar to this, but didn't find my answer there.

From the link, I could gather that HTML supports the display of ISO 8859/1 8-bit single-byte coded graphic characters, through numerical representations such as:

&32; for Space.

&33; for Exclamation mark.

The above won't be resolved unless the entity names are prefixed with the #:

  for Space would be resolved.

! for Exclamation mark would be resolved.

What is the reason for prefixing the entity names with the # symbol for these characters, when the ISO Latin 1 Character Entities do not follow the same standards. It can be deduced that the HTML parser would be written in such a way to deal with these, but it would be great to know why this standard was introduced in the first place.

Community
  • 1
  • 1
BatScream
  • 19,260
  • 4
  • 52
  • 68
  • 1
    What do you mean by "ISO Latin 1 Character Entities do not follow the same standards"? Also just btw, you can use numeric character references for the entire Unicode range. – Matti Virkkunen May 14 '15 at 18:56
  • @MattiVirkkunen - Thanks for the response. `ISO Latin 1 Character Entities`, do not require the entity names to be prefixed by the `#` symbol. – BatScream May 14 '15 at 19:02
  • 1
    Oh! I didn't notice because you added the `#`s back in your post. Maybe you should remove them so people can actually see what you're talking about. – Matti Virkkunen May 14 '15 at 19:04
  • @MattiVirkkunen - That would make it more clear. Have edited my post. – BatScream May 14 '15 at 19:07
  • Also you did notice that you linked to a pretty ancient spec (HTML 3), right? I can't find a mention of numeric character references without a `#` in [HTML 4](http://www.w3.org/TR/REC-html40/charset.html#h-5.3.1) for instance. – Matti Virkkunen May 14 '15 at 19:08
  • Thanks for the link. It has it there in the document, but i would also like to why numerical characters cannot be handled the same way without the `#` symbol, would there have been any specific reason for this. – BatScream May 14 '15 at 19:15
  • 1
    HTML 3 caused lots of problems and was never implemented, superseded by HTML 3.2. Most probably, implementing numeric character references without hashes caused backward compatibility problems - pages that were written expecting `&33;` to display as `&33;` suddenly displayed as `!` instead. – Alohci May 15 '15 at 00:01

1 Answers1

1

The full gory details of how these are processed is detailed in the parsing section of the HTML 5 specification. You notably want to read the links to "consume a character reference".

HTML 3 was never relevant, and even HTML 3.2 was superseded long ago. ISO documents are also irrelevant in this context.

Following the parsing algorithm can be painful (at least, it takes some getting used-to), but it is guaranteed correct.

Robin Berjon
  • 1,013
  • 12
  • 10