6

I have a simple forums application, when someone posts any content, i do:

post.Content = Sanitizer.GetSafeHtml(post.Content);

Now, i am not sure if i am doing something wrong, or what is going on, but it does not allow almost no html. Even simple <b></b> is too much for it. So i guess that tool is totally useless.

Now my question: Can anyone tell me how should i sanitize my users inputs so that they can post some images(<img> tags) and use bold emphasis etc?

Ray Cheng
  • 12,230
  • 14
  • 74
  • 137
ojek
  • 9,680
  • 21
  • 71
  • 110
  • What are you trying to do? Are you trying to strip out the dangerous html tags? Maybe you are looking for `post.Content = Encoder.HtmlEncode("test");` – Ray Cheng Sep 23 '12 at 17:11
  • 1
    Yes i want to strip out the dangerous tags. `.HtmlEncode` encodes all the tags, so `` or `` won't work... – ojek Sep 23 '12 at 17:18
  • 1
    What do you mean "won't work"? If you display `<b>test</>` in a browser, the browser will make `test` bold. Remember that browsers display `test` and `<b>test</>` the same way. If you want to strip out some html tags but keep some others? I think that's a risky thing to do. – Ray Cheng Sep 23 '12 at 17:30
  • 4
    @RayCheng you're talking crazy talk. – McGarnagle Sep 23 '12 at 17:34
  • 5
    @RayCheng If i give the browser `<b>test</>` as output, it will display `test`, not make the `test` string bold. That why is it made for. :) – ojek Sep 23 '12 at 17:38

3 Answers3

6

It seems that many people find the sanitizer rather useless. Instead of using the sanitizer, just encode everything, and decode safe parts back:

private static readonly IEnumerable<string> WhitelistedTags =
    new[] { "<b>", "</b>", "<i>", "</i>" };

private static readonly (string Encoded, string Decoded)[] DecodingPairs =
    WhitelistedTags
    .Select(tag => (Microsoft.Security.Application.Encoder.HtmlEncode(tag), tag))
    .ToArray();

public static string Sanitize(string html)
{
    // Encode the whole thing
    var safeHtml = Microsoft.Security.Application.Encoder.HtmlEncode(html);
    var builder = new StringBuilder(safeHtml);

    // Decode the safe parts
    foreach (var (encodedTag, decodedTag) in DecodingPairs)
    {
        builder.Replace(encodedTag, decodedTag);
    }

    return builder.ToString();
}

Please note that it's nearly impossible to safely decode an IMG tag, since there are really simple ways for an attacker to abuse this tag. Examples:

<IMG SRC="javascript:alert('XSS');">

<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;&#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>

Take a look here for more a thorough XSS Cheat Sheet

Steven
  • 166,672
  • 24
  • 332
  • 435
  • 1
    Yes, thank you, but i am looking for some already made solutions, so i can use them. I think it will be safer and will save me some time. It's sad that sanitizer does not work, it was my best option so far... – ojek Sep 23 '12 at 17:42
  • The line `safeHtml.Replace(encodedTag,decodedTag);` doesn't do anything as the result of the `Replace` method is lost? – Rafael Apr 13 '19 at 09:16
  • 1
    Hi @Zesty, that code change has no effect. A `StringBuilder`'s `Replace` method just returns a reference to itself. I refactored the code a bit though to make it more readable. I also tested the code; it does what it is intended to do. – Steven May 03 '21 at 12:20
1

This post best describes the issues with the Anti XSS library and provides a good work around that whitelists a set of tags and attributes.

I'm using this solution in my project and it seems to work great.

Joel Mitchell
  • 945
  • 1
  • 7
  • 19
-1

There is a quite simple way to block the threat by just getting rid of the "dangerous" tags.

string SanitizeHtml(string html)
{
        html = System.Web.HttpUtility.HtmlDecode(html);

        List<string> blackListedTags = new List<string>() 
        {
                "body", "script", "iframe", "form", "object", "embed", "link", "head", "meta" 
        };

        foreach (string tag in blackListedTags) { 
            html = Regex.Replace(html, "<" + tag, "<p", RegexOptions.IgnoreCase); 
            html = Regex.Replace(html, "</" + tag, "</p", RegexOptions.IgnoreCase);
        }

        return html;
}

With this the user will still see what is within the dangerous script, but it won't harm anything.

Plamen Kasabov
  • 145
  • 1
  • 4
  • I do not recommend using blacklisting, as done in this answer, because it is almost impossible to do this in a way that can't be abused by a hacker. Just take a look at [how many ways there are to abuse images](https://owasp.org/www-community/xss-filter-evasion-cheatsheet) to inject scripts. – Steven May 03 '21 at 12:23