It seems that many people find the sanitizer rather useless. Instead of using the sanitizer, just encode everything, and decode safe parts back:
private static readonly IEnumerable<string> WhitelistedTags =
new[] { "<b>", "</b>", "<i>", "</i>" };
private static readonly (string Encoded, string Decoded)[] DecodingPairs =
WhitelistedTags
.Select(tag => (Microsoft.Security.Application.Encoder.HtmlEncode(tag), tag))
.ToArray();
public static string Sanitize(string html)
{
// Encode the whole thing
var safeHtml = Microsoft.Security.Application.Encoder.HtmlEncode(html);
var builder = new StringBuilder(safeHtml);
// Decode the safe parts
foreach (var (encodedTag, decodedTag) in DecodingPairs)
{
builder.Replace(encodedTag, decodedTag);
}
return builder.ToString();
}
Please note that it's nearly impossible to safely decode an IMG tag, since there are really simple ways for an attacker to abuse this tag. Examples:
<IMG SRC="javascript:alert('XSS');">
<IMG SRC=javascript:alert('XSS')>
Take a look here for more a thorough XSS Cheat Sheet