I'm using C# to call a REST API which returns a JSON object containing HTML code. Here is an example of the object I'm interested in
{
"Body": "<html class=\"sg-campaigns\"><head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><meta content=\"text/html; charset=utf-8\"><meta name=\"viewport\" content=\"width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1\"><meta content=\"IE=Edge\"><style type=\"text/css\">\r\n<!--\r\nbody ..."
}
I would like to further process the HTML code but since it contains various elements to be a valid JSON string deserialization with System.Text.Json
fails with the following exception
System.Text.Json.JsonReaderException: '<' is an invalid start of a value.
I have tried using the following code to deserialize the content of the Body attribute
var options = new JsonSerializerOptions()
{
Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
WriteIndented = true
};
var content = JsonSerializer.Deserialize<String>(html, options);
The elements causing errors are for example:
\ "
< !--
\r
,\n
,\t
I'm curious to learn how the Body
attribute from the code above can be cleaned to only contain valid HTML, maybe someone from the community has an idea about this.