I'm trying to develop a secure web application that can accept form data, encode it into the database to eliminate cross-site scripting issues, and then format it nicely on other web pages.
Form data is being encoded using
HttpUtility.HtmlEncode('It's my wedding!')
An example of this working is someone entering "It's my wedding!" into a textbox. This enters the database formatted as:
It's my wedding!
If I then pull this out of the database and display it using a .NET literal control, it's displayed exactly like that, with the apostrophe remaining encoded on the screen.
Web browsers interpret & as an ampersand and © as a copyright symbol - Why don't they interpret the code ' as an apostrophe?
Say that I then use:
HttpUtility.HtmlDecode('It's my wedding!');
This will sort out my apostrophe issue, but if I use the HtmlDecode method when someone has managed to inject malicious javascript into this field such as:
It's my wedding!<script type="text/javascript">alert('XSS!');</script>
It'll also decode the encoded javascript, and the attack will execute. If this is the case, why are we using HttpUtility.HtmlEncode() in the first place?
I've seen people using the Microsoft AntiXss library at http://wpl.codeplex.com/, but it seems to be receiving horrendous reviews about its quality and effectiveness due to users' inability to amend the white-list that it offers.
What are you supposed to do to safely encode HTML and allow it to display whilst still preventing XSS attacks? Is stripping / encoding the tags specifically the only solution?
How has everyone handled this before?
Thanks!
Karl