According to the official documentation of THTMLEncoding
, it only supports character entities for the reserved HTML characters "
, &
, <
, and >
:
THTMLEncoding only encodes reserved HTML characters: "&<>
.
But it also is able to decode numeric character references:
THTMLEncoding supports decoding any HTML numeric character reference, such as ©
or þ
, as well as the character entity references of reserved HTML characters: "
, &
, <
, >
.
So the only named character entities it supports are the ones for the HTML reserved characters "
, &
, <
, and >
.
Indeed, the documentation emphasises and warns
Warning: Decoding character entity references of non-reserved characters, such as '
or ©
, is not supported. The input data must not contain any other character entity references. Otherwise, the output data may be corrupted.
Fortunately, this SO question (and the answer by Ian Boyd) contains code to decode HTML character entities other than those for the reserved characters.