HttpUtility.HtmlEncode, HttpUtility.HtmlDecode, the AntiXSS library and correctly formatting user-entered input

Question

I'm trying to develop a secure web application that can accept form data, encode it into the database to eliminate cross-site scripting issues, and then format it nicely on other web pages.

Form data is being encoded using

HttpUtility.HtmlEncode('It's my wedding!')

An example of this working is someone entering "It's my wedding!" into a textbox. This enters the database formatted as:

It's my wedding!

If I then pull this out of the database and display it using a .NET literal control, it's displayed exactly like that, with the apostrophe remaining encoded on the screen.

Web browsers interpret & as an ampersand and © as a copyright symbol - Why don't they interpret the code ' as an apostrophe?

Say that I then use:

HttpUtility.HtmlDecode('It&#39;s my wedding!');

This will sort out my apostrophe issue, but if I use the HtmlDecode method when someone has managed to inject malicious javascript into this field such as:

It's my wedding!<script type="text/javascript">alert('XSS!');</script>

It'll also decode the encoded javascript, and the attack will execute. If this is the case, why are we using HttpUtility.HtmlEncode() in the first place?

I've seen people using the Microsoft AntiXss library at http://wpl.codeplex.com/, but it seems to be receiving horrendous reviews about its quality and effectiveness due to users' inability to amend the white-list that it offers.

What are you supposed to do to safely encode HTML and allow it to display whilst still preventing XSS attacks? Is stripping / encoding the tags specifically the only solution?

How has everyone handled this before?

Thanks!

Karl

One fast comment. You saves it on the database as it is, you print them on the webpage encoded. — Aristos, Feb 28 '13 at 09:29

score 3 · Accepted Answer · answered Apr 09 '13 at 10:57

Okay, so here's the solution I've arrived at.

I want to protect other developers from switching off request validation and outputting fields without checking what they're outputting, so I'm going to use the HttpUtility.HtmlEncode method to encode the input. This means that when other developers spit this information out, it's still encoded and if they then wish to blithely throw the contents into HttpUtility.HtmlDecode, then it's their responsibility.

I however, will build a method that's capable of escaping only the most basic of formatting that I see frequently in my user input that can be construed as safe. Those characters in my case, are single quotes and double quotes. All other content will remain encoded. If there's a lot of a particular safe character appearing in real life user input or test input that I haven't addressed, I'll retrospectively add it to the whitelist.

score 2 · Answer 2 · edited Mar 02 '13 at 00:26

2

How are you receiving the data?

The .NET WebForms infrastructure itself should block a lot of these things by default anyway, assuming ValidateRequest is set to true.

The HtmlEncode should be used when outputting data that is input by users (thus preventing nastiness). HtmlDecode doesn't come to the party in this scenario.

edited Mar 02 '13 at 00:26

Jesse

8,605
7
47
57

answered Mar 02 '13 at 00:06

Keith Jackson

21
1

Thanks for answering Keith! If you HtmlEncode, then nastiness is indeed prevented, but formatting is utterly lost, as characters such as the apostrophe mentioned above are encoded and the browser doesn't interpret and format them correctly. I want formatted content but encoded HTML - Is that beyond the help of this utility? – Karl Mar 02 '13 at 16:39

HttpUtility.HtmlEncode, HttpUtility.HtmlDecode, the AntiXSS library and correctly formatting user-entered input

2 Answers2