Deserialize JSON encoded HTML code with System.Text.Json

Question

I'm using C# to call a REST API which returns a JSON object containing HTML code. Here is an example of the object I'm interested in

{    
    "Body": "<html class=\"sg-campaigns\"><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><meta content=\"text/html; charset=utf-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1\"><meta content=\"IE=Edge\"><style type=\"text/css\">\r\n<!--\r\nbody ..."
}

I would like to further process the HTML code but since it contains various elements to be a valid JSON string deserialization with System.Text.Json fails with the following exception

System.Text.Json.JsonReaderException: '<' is an invalid start of a value.

I have tried using the following code to deserialize the content of the Body attribute

var options = new JsonSerializerOptions()
{
    Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
    WriteIndented = true
};

var content = JsonSerializer.Deserialize<String>(html, options);

The elements causing errors are for example:

\ "
< !--
\r, \n, \t

I'm curious to learn how the Body attribute from the code above can be cleaned to only contain valid HTML, maybe someone from the community has an idea about this.

thanks for your reply! yes i have tried using JsonSerializerOptions with Encoder settings. I have updated my question with the info — nor0x, Jun 28 '21 at 12:12
Have you tried escaping the carriage return(`\\r`), tab(`\\t`) and newline(`\\n`) characters? — tinstaafl, Jun 28 '21 at 14:47

score 1 · Accepted Answer · answered Jun 28 '21 at 12:28

1

Create a simple object that contains a Body property (and any other properties you're going to use):

internal class ResponseObject
{
    public string Body { get; set; }
}

Then deserialize the response JSON to that type of object instead of String.:

var content = JsonSerializer.Deserialize<ResponseObject>(html, options);

Content.Body will contain the decoded HTML.

answered Jun 28 '21 at 12:28

Ben Osborne

1,412
13
20

i feel stupid now what an easy solution. Thank you very much. – nor0x Jun 28 '21 at 12:42

Deserialize JSON encoded HTML code with System.Text.Json

1 Answers1