0

I'm using C# to call a REST API which returns a JSON object containing HTML code. Here is an example of the object I'm interested in

{    
    "Body": "<html class=\"sg-campaigns\"><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><meta content=\"text/html; charset=utf-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1\"><meta content=\"IE=Edge\"><style type=\"text/css\">\r\n<!--\r\nbody ..."
}

I would like to further process the HTML code but since it contains various elements to be a valid JSON string deserialization with System.Text.Json fails with the following exception

System.Text.Json.JsonReaderException: '<' is an invalid start of a value.

I have tried using the following code to deserialize the content of the Body attribute

var options = new JsonSerializerOptions()
{
    Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
    WriteIndented = true
};

var content = JsonSerializer.Deserialize<String>(html, options);

The elements causing errors are for example:

  • \ "
  • < !--
  • \r, \n, \t

I'm curious to learn how the Body attribute from the code above can be cleaned to only contain valid HTML, maybe someone from the community has an idea about this.

Heretic Monkey
  • 11,687
  • 7
  • 53
  • 122
nor0x
  • 1,213
  • 1
  • 15
  • 41

1 Answers1

1

Create a simple object that contains a Body property (and any other properties you're going to use):

internal class ResponseObject
{
    public string Body { get; set; }
}

Then deserialize the response JSON to that type of object instead of String.:

var content = JsonSerializer.Deserialize<ResponseObject>(html, options);

Content.Body will contain the decoded HTML.

Ben Osborne
  • 1,412
  • 13
  • 20