5

if i have a user entering data into a rich text editor (tiny editor) and submitting data that i am storing into a database and then retrieving to show on other dynamic web pages, why do i need encoding here.

Is the only reason because someone might paste javascript into the rich text editor? is there any other reason ?

leora
  • 188,729
  • 360
  • 878
  • 1,366

9 Answers9

16

Security is the reason.

The most obvious/common reason is Cross-Site-Scripting (XSS). It turns out to be the root cause of the security problems you might witness in your site.

Cross-site scripting (XSS) is a type of computer security vulnerability typically found in web applications that enables malicious attackers to inject client-side script into web pages viewed by other users. An exploited cross-site scripting vulnerability can be used by attackers to bypass access controls such as the same origin policy. Cross-site scripting carried out on websites were roughly 80% of all security vulnerabilities documented by Symantec as of 2007.1 Their impact may range from a petty nuisance to a significant security risk, depending on the sensitivity of the data handled by the vulnerable site, and the nature of any security mitigations implemented by the site's owner.

Additional, as shown in below comments, the layout of your site can also be screwed up.

You need Microsoft Anti-Cross Site Scripting Library

More Resources

http://forums.asp.net/t/1223756.aspx

  • Additionally, design wise if they enter
    .. with no closing tag that would ruin the whole page layout. That's why public editors (like here on SO) don't make HTML, they have their own subset of tags and THEY do the formatting to make hmtl.
    – Dan Heberden May 26 '10 at 16:30
  • 2
    You're missing the point. He's accepting HTML-formatted text, so he cannot escape it. – SLaks May 26 '10 at 16:33
  • @Dan Heberden: That's true agreed :) –  May 26 '10 at 16:33
  • I just realized that tiny editor seems to do this for you so thats why i was confused why everything was working without me doing anything – leora May 26 '10 at 16:43
  • Yes, but you **NEED** server-side validation, or anyone will be able to easily inject a ` – SLaks May 26 '10 at 16:44
  • @SLaks: That's the way to go, bad guys can disable javascript, so javascript validation won't do of course. –  May 26 '10 at 16:45
3

You're making some mistakes.

If you're accepting HTML-formatted text from the rich-text editor, you cannot call Html.Encode, or it will encode all of the HTML tags, and you'll see raw markup instead of formatted text.

However, you still need to protect against XSS.

In other words, if the user enters the following HTML:

<b>Hello!</b>
<script>alert('XSS!');</script>

You want to keep the <b> tag, but drop (not encode) the <script> tag.
Similarly, you need to drop inline event attributes (like onmouseover) and Javascript URLs (like <a href="javascript:alert('XSS!');>Dancing Bunnies!</a>)

You should run the user's HTML through a strict XML parser and maintain a strict white-list of tags and attributes when saving the content.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • The user isn't typing in HTML. the editor is rich text so they type in like ms word and when i grab the data i get the encoded html as an output – leora May 26 '10 at 16:44
  • Yes, but you **NEED** server-side validation, or anyone will be able to easily inject a ` – SLaks May 26 '10 at 16:45
  • @ SLaks - as tiny editor seems to give you already encoded HTML, are you suggestion that i call HTML.Encode() on already encoded data. wouldn't that cause issues in the normal use case. – leora May 26 '10 at 17:11
  • **No, I'm not**. You need to filter the tags and attributes. – SLaks May 26 '10 at 17:24
3

I think you're confusing "encoding" with "scrubbing."

If you want to accept text from a user, you need to encode it as HTML before you render it as HTML. In this way, the text

a < b

is HTML-encoded as

a &lt; b

and rendered in an HTML browser (just as the user entered it) as:

a < b

If you want to accept HTML from a user (which it sounds like you do in this case), it's already in HTML format, so you don't want to call HTML.Encode again. However, you may want to scrub it to remove certain markup that you don't allow (like script blocks).

C. Dragon 76
  • 9,882
  • 9
  • 34
  • 41
2

Security is the main reason.

Abe Miessler
  • 82,532
  • 99
  • 305
  • 486
2

Not only could a user enter javascript code or some other naughtiness, you need to use HTML encode in order to display certain characters on the page. You wouldn't want your page to break because your database contained: "Nice Page :->".

Also, if you are entering the code into a database, be sure to "sanatize" the inputs to the database.

Vivian River
  • 31,198
  • 62
  • 198
  • 313
  • Star - are you saying i should encode before saving to the db. – leora May 26 '10 at 16:28
  • You need to use `htmlencode` when you display string literals on a page that may contain characters such as `>` or `&`. When you save to the DB, you need to 'sanitize' your database inputs, which is actually a separate issue (but conceptually related). See here: http://www.unixwiz.net/techtips/sql-injection.html – Vivian River May 26 '10 at 16:33
  • I just realized that tiny editor seems to do this for you so thats why i was confused why everything was working without me doing anything – leora May 26 '10 at 16:43
  • Yes, but you **NEED** server-side validation, or anyone will be able to easily inject a ` – SLaks May 26 '10 at 16:44
1

Yes, it is to prevent JavaScript from executing if someone were to input malicious string into the rich text editor. However, plain text javascript it not your only concern, for example this is a XSS:

<IMG SRC=&#0000106&#0000097&#0000118&#0000097&#0000115&#0000099&#0000114&#0000105&#0000112&#0000116&#0000058&#0000097&#0000108&#0000101&#0000114&#0000116&#0000040&#0000039&#0000088&#0000083&#0000083&#0000039&#0000041>

Take a look here for a range of different XSS options; http://ha.ckers.org/xss.html

Dustin Laine
  • 37,935
  • 10
  • 86
  • 125
1

As an aside..... MVC2 has implemented new functionality so you no longer need to call HTML.Encode

if you change your view syntax from

to

MVC will automatically encode for you. It makes thing much easier/quicker. Again, MVC2 only

John Ptacek
  • 1,886
  • 1
  • 15
  • 20
0

Another reason is that some user can input a few closing tags </div></table> and potentially break the layout of your web site. If you are using an HTML editing tool make sure the produced html is valid before embedding it in the page without encoding. Some server side parsing is required in order to do this. You can use HtmlAgilityPack to do this.

Atanas Korchev
  • 30,562
  • 8
  • 59
  • 93
0

The primary reason to do what your suggesting is to escape your output. Since you are accepting HTML and want to output it you can't do that. What you need to do is filter out thing that user's can do that are insecure, or at least not what you want.

For that, let me suggest AntiSamy.

You can demo it here.

What you are doing has a lot of inherit risks and you should consider it very carefully.

Flory
  • 2,849
  • 20
  • 31