21

EDIT: For future reference, I'm using non-xhtml content type definition <!html>

I'm creating a website using Django, and I'm trying to embed arbitrary json data in my pages to be used by client-side javascript code.

Let's say my json object is {"foo": "</script>"}. If I embed this directly,

<script type='text/javascript'>JSON={"foo": "</script>"};</script>

The first closes the json object. (also, it will make the site vulnerable to XSS, since this json object will be dynamically generated).

If I use django's HTML escape function, the resulting output is:

<script type='text/javascript'>JSON={&quot;foo&quot;: &quot;&lt;/script&gt;&quot;};</script> 

and the browser cannot interpret the <script> tag.

The question I have here is,

  1. Which characters am i suppose to escape / not escape in this situation?
  2. Is there automated way to perform this in Python / django?
Jeeyoung Kim
  • 5,827
  • 8
  • 44
  • 54
  • You can use entity references (<, >) within – yonran Nov 14 '10 at 07:19
  • @yonran, so, escaping only slashes by running string-replacement for / to \/ is good enough? – Jeeyoung Kim Nov 14 '10 at 11:21
  • yes, that should be the case. For more information about how browsers parse the script tag, see HTML 5 tokenization: – yonran Nov 14 '10 at 22:27
  • Sorry, I was wrong. Let me clarify. – yonran Nov 14 '10 at 22:35

4 Answers4

11

If you are using XHTML, you would be able to use entity references (&lt;, &gt;, &amp;) to escape any string you want within <script>. You would not want to use a <![CDATA[...]]> section, because the sequence "]]>" can't be expressed within a CDATA section, and you would have to change the script to express ]]>.

But you're probably not using XHTML. If you're using regular HTML, the <script> tag acts somewhat like a CDATA section in XML, except that it has even more pitfalls. It ends with </script>. There are also arcane rules to allow <!-- document.write("<script>...</script>") --> (the comments and <script> opening tag must both be present for </script> to be passed through). The compromise that the HTML5 editors adopted for future browsers is described in HTML 5 tokenization and CDATA Escapes

I think the takeaway is that you must prevent </script> from occurring in your JSON, and to be safe you should also avoid <script>, <!--, and --> to prevent runaway comments or script tags. I think it's easiest just to replace < with \u003c and --> with --\>

yonran
  • 18,156
  • 8
  • 72
  • 97
  • 1
    I'll add that you need to escape the HTML characters <, >, & and = to make your json string safe to embed. According to google's gson library. http://google-gson.googlecode.com/svn/trunk/gson/docs/javadocs/index.html – reconbot Feb 16 '12 at 19:59
6

I tried backslash escaping the forward slash and that seems to work:

<script type='text/javascript'>JSON={"foo": "<\/script>"};</script>

have you tried that?


On a side note, I am surprised that the embedded </script> tag in a string breaks the javascript. Couldn't believe it at first but tested in Chrome and Firefox.

slebetman
  • 109,858
  • 19
  • 140
  • 171
  • 1
    embedded breaking is kinda expected (i thought it was strange too), because that means js parsing must be done along the HTML parsing (html parser must be aware of the semantics of javascript text), which seems very complicated to me. – Jeeyoung Kim Nov 14 '10 at 07:18
  • 1
    Yep, HTML parsers as a rule don't speak JavaScript. The contents of the script tags are passed to the interpreter only after the HTML is parsed, and HTML doesn't say anything about tags not being tags when they're between quotation marks! – Nicholas Knight Nov 14 '10 at 07:22
  • 1
    Yes, that is expected - the usual trick to prevent it is the break up the tag into two - `""` – Yi Jiang Nov 14 '10 at 07:40
0

For this case in python, I have opened a bug in the bug tracker. However the rules are indeed complicated, as <!-- and <script> play together in quite evil ways even in the adopted html5 parsing rules. BTW, ">" is not a valid JSON escape, so it would better be replaced with "\u003E", thus the absolutely safe escaping should be to escape \u003C and \u003E AND a couple other evil characters mentioned in the python bug...

0

I would do something like this:

<script type='text/javascript'>JSON={"foo": "</" + "script>"};</script>
Elmer
  • 9,147
  • 2
  • 48
  • 38