For example, StackExchange whitelists a subset of HTML: https://meta.stackexchange.com/questions/1777/what-html-tags-are-allowed-on-stack-exchange-sites
How could you do that in your controller to make sure user input is safe?
For example, StackExchange whitelists a subset of HTML: https://meta.stackexchange.com/questions/1777/what-html-tags-are-allowed-on-stack-exchange-sites
How could you do that in your controller to make sure user input is safe?
This approach is not identical to StackExchange, but I found the AntiXSS 4.x library to a simple way to sanitize the input to allow "safe" HTML.
http://www.microsoft.com/en-us/download/details.aspx?id=28589 You can download a version here, but I linked it for the useful DOCX file. My preferred method is to use the NuGet package manager to get the latest AntiXSS package.
You can use the HtmlSanitizationLibrary assembly found in the 4.x AntiXss library. Note that GetSafeHtml() is in the HtmlSanitizationLibrary, under Microsoft.Security.Application.Sanitizer.
content = Sanitizer.GetSafeHtml(userInput);
This can be done before saving to the database. The advantage is removing malicious content immediately, and not having to worry about it when you output it. The disadvantage is that it won't handle any existing database content, and you do have to apply this any time you're making database updates.
The alternate approach is to use this method every time you output content.
I'd love to hear what the preferred approach is.
You can try JSoup parser which along with sanitizing your HTML input will also provide many functionalities out of the box. You can visit http://jsoup.org/ for more details on the JSoup and download the binary from there. It provides DOM method to traverse through your HTML tree and get desired elements.
Although sanitizing your HTML generated code to prevent XSS attack is a goodd practice, but I would strongly advise to avoid using any parser to avoid XSS attach by sanitizing your HTML input. If your HTML tree is very big then the response time would increase manifold.Instaed of sanitizing your HTML tree you should ensure that whatever user is entering in the FORM is proper and as per the expected value.
You can visit www.owasp.org to know more about how to avoid XSS attack.The site provides you possible cheat sheets to ensure your HTML tree is free from any XSS attack.
ASP.NET HttpUtility.Htmlencode() makes it for you. But if you want to block dangerous scripts, first DO NOT insert it to your database. First, clean the HTML Text before inserting to database.
I found a class that do it for you: http://eksith.wordpress.com/2012/02/13/antixss-4-2-breaks-everything/
It works fine and you can add new tags and attributes to custom whitelist of the Sanitizer.
Note: Microsoft Sanitizer and Anti-XSS Library was not useful for me. May be you can also try them.