From the HTML Standard § 4.2.5.4 Specifying the document's character encoding:
The Encoding standard requires use of the UTF-8 character encoding and requires use of the "utf-8" encoding label to identify it. Those requirements necessitate that the document's character encoding declaration, if it exists, specifies an encoding label using an ASCII case-insensitive match for "utf-8". Regardless of whether a character encoding declaration is present or not, the actual character encoding used to encode the document must be UTF-8.
(…)
If an HTML document does not start with a BOM, and its encoding is not explicitly given by Content-Type metadata, and the document is not an iframe srcdoc document, then the encoding must be specified using a meta element with a charset attribute or a meta element with an http-equiv attribute in the Encoding declaration state.
Note. A character encoding declaration is required (either in the Content-Type metadata or explicitly in the file) even when all characters are in the ASCII range, because a character encoding is needed to process non-ASCII characters entered by the user in forms, in URLs generated by scripts, and so forth.
So my understanding is that:
- There is only one allowed encoding, namely UTF-8.
- Nonetheless the encoding must still be explicitly specified.
Why?
Isn't this redundant to specify the encoding if the encoding must always be UTF-8?
Provided that the document specifies it is written in HTML 5 (eg if it declares <!DOCTYPE html>
as opposed to, say, something like <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
it seems to me that character encoding could be optional with user agents defaulting to UTF-8 if not specified?