Issue with html decoding in Delphi

Question

I have a problem with TNETEncoding.HTML.Decode in Delphi 10.3.3 (rio).

It seems the decoding is not functioning correctly but maybe i'm doing something wrong here.

var s := TNETEncoding.HTML.Decode('carri&egrave;re');

returns in s:

'carri&egrave;re'

It should be 'carrière' (without de quotes). See: https://www.convertstring.com/en/EncodeDecode/HtmlDecode

Is there something i'm doing wrong here?

Andreas Rejbrand · Accepted Answer · 2020-05-19T12:15:59.523

According to the official documentation of THTMLEncoding, it only supports character entities for the reserved HTML characters ", &, <, and >:

THTMLEncoding only encodes reserved HTML characters: "&<>.

But it also is able to decode numeric character references:

THTMLEncoding supports decoding any HTML numeric character reference, such as © or þ, as well as the character entity references of reserved HTML characters: ", &, <, >.

So the only named character entities it supports are the ones for the HTML reserved characters ", &, <, and >.

Indeed, the documentation emphasises and warns

Warning: Decoding character entity references of non-reserved characters, such as ' or ©, is not supported. The input data must not contain any other character entity references. Otherwise, the output data may be corrupted.

Fortunately, this SO question (and the answer by Ian Boyd) contains code to decode HTML character entities other than those for the reserved characters.

Issue with html decoding in Delphi

1 Answers1