How to properly sanitize content with AntiXss Library?

Question

I have a simple forums application, when someone posts any content, i do:

post.Content = Sanitizer.GetSafeHtml(post.Content);

Now, i am not sure if i am doing something wrong, or what is going on, but it does not allow almost no html. Even simple  is too much for it. So i guess that tool is totally useless.

Now my question: Can anyone tell me how should i sanitize my users inputs so that they can post some images(<img> tags) and use bold emphasis etc?

What are you trying to do? Are you trying to strip out the dangerous html tags? Maybe you are looking for `post.Content = Encoder.HtmlEncode("test");` — Ray Cheng, Sep 23 '12 at 17:11
Yes i want to strip out the dangerous tags. `.HtmlEncode` encodes all the tags, so `` or `` won't work... — ojek, Sep 23 '12 at 17:18
What do you mean "won't work"? If you display `test</>` in a browser, the browser will make `test` bold. Remember that browsers display `test` and `test</>` the same way. If you want to strip out some html tags but keep some others? I think that's a risky thing to do. — Ray Cheng, Sep 23 '12 at 17:30
@RayCheng If i give the browser `test</>` as output, it will display `test`, not make the `test` string bold. That why is it made for. :) — ojek, Sep 23 '12 at 17:38

Steven · Answer 1 · 2021-05-03T12:18:07.110

It seems that many people find the sanitizer rather useless. Instead of using the sanitizer, just encode everything, and decode safe parts back:

private static readonly IEnumerable<string> WhitelistedTags =
    new[] { "<b>", "</b>", "<i>", "</i>" };

private static readonly (string Encoded, string Decoded)[] DecodingPairs =
    WhitelistedTags
    .Select(tag => (Microsoft.Security.Application.Encoder.HtmlEncode(tag), tag))
    .ToArray();

public static string Sanitize(string html)
{
    // Encode the whole thing
    var safeHtml = Microsoft.Security.Application.Encoder.HtmlEncode(html);
    var builder = new StringBuilder(safeHtml);

    // Decode the safe parts
    foreach (var (encodedTag, decodedTag) in DecodingPairs)
    {
        builder.Replace(encodedTag, decodedTag);
    }

    return builder.ToString();
}

Please note that it's nearly impossible to safely decode an IMG tag, since there are really simple ways for an attacker to abuse this tag. Examples:

<IMG SRC="javascript:alert('XSS');">

<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>

Take a look here for more a thorough XSS Cheat Sheet

Yes, thank you, but i am looking for some already made solutions, so i can use them. I think it will be safer and will save me some time. It's sad that sanitizer does not work, it was my best option so far... — ojek, Sep 23 '12 at 17:42
The line `safeHtml.Replace(encodedTag,decodedTag);` doesn't do anything as the result of the `Replace` method is lost? — Rafael, Apr 13 '19 at 09:16
Hi @Zesty, that code change has no effect. A `StringBuilder`'s `Replace` method just returns a reference to itself. I refactored the code a bit though to make it more readable. I also tested the code; it does what it is intended to do. — Steven, May 03 '21 at 12:20

score 1 · Answer 2 · answered Dec 18 '12 at 09:53

1

This post best describes the issues with the Anti XSS library and provides a good work around that whitelists a set of tags and attributes.

I'm using this solution in my project and it seems to work great.

answered Dec 18 '12 at 09:53

Joel Mitchell

945
1
7
19

Plamen Kasabov · Answer 3 · 2017-09-15T08:35:53.243

There is a quite simple way to block the threat by just getting rid of the "dangerous" tags.

string SanitizeHtml(string html)
{
        html = System.Web.HttpUtility.HtmlDecode(html);

        List<string> blackListedTags = new List<string>() 
        {
                "body", "script", "iframe", "form", "object", "embed", "link", "head", "meta" 
        };

        foreach (string tag in blackListedTags) { 
            html = Regex.Replace(html, "<" + tag, "<p", RegexOptions.IgnoreCase); 
            html = Regex.Replace(html, "</" + tag, "</p", RegexOptions.IgnoreCase);
        }

        return html;
}

With this the user will still see what is within the dangerous script, but it won't harm anything.

I do not recommend using blacklisting, as done in this answer, because it is almost impossible to do this in a way that can't be abused by a hacker. Just take a look at [how many ways there are to abuse images](https://owasp.org/www-community/xss-filter-evasion-cheatsheet) to inject scripts. — Steven, May 03 '21 at 12:23

How to properly sanitize content with AntiXss Library?

3 Answers3

Linked