My solution was to subclass TextNode and override the method that does the escaping.
package org.jsoup.nodes;
public class UnescapedTextNode extends TextNode
{
public UnescapedTextNode( final String text, final String baseUri )
{
super( text, baseUri );
}
@Override
void outerHtmlHead(
final StringBuilder accum,
final int depth,
final Document.OutputSettings out )
{
//String html = Entities.escape( getWholeText(), out ); // Don't escape!
String html = getWholeText();
if ( out.prettyPrint() &&
parent() instanceof Element &&
!Element.preserveWhitespace( parent() ) )
{
html = normaliseWhitespace( html );
}
if ( out.prettyPrint() &&
( ( siblingIndex() == 0 &&
parentNode instanceof Element &&
( (Element)parentNode ).tag().formatAsBlock() &&
!isBlank() ) ||
( out.outline() &&
siblingNodes().size() > 0 &&
!isBlank() ) ) )
{
indent( accum, depth, out );
}
accum.append( html );
}
}
This is pretty much a verbatim copy of TextNode.outerHtmlHead()
(as originally written by Jonathan Hedley). I've just commented out the escaping part. This is how I used it:
// ... assuming head is of type Element and refers to the <head> of the document.
final String message = "Hello World!";
final String messageScript = "alert( \"" + message + "\" );";
final Element messageScriptEl = head.appendElement( "script" ).
attr( "type", "text/javascript" );
final TextNode messageScriptTextNode = new UnescapedTextNode(
messageScript,
messageScriptEl.baseUri() );
messageScriptEl.appendChild( messageScriptTextNode );
// ... etc
Further on, calling Document.toString()
or Document.outerHtml()
produces output with the text inside the script tag that was created unescaped. ie:
<script type="text/javascript">alert( "Hello World!" );</script>
instead of:
<script type="text/javascript">alert( "Hello World!" );</script>
as was occurring previously.
There were two 'gotchas' that I found:
The UnescapedTextNode class needs to be loaded by the same classloader that loads the original jsoup library. This is because above, I've overrided a package-private method, and that is stipulated in the JLS. (Thanks to Jeff Sinclair for the article that pointed this out to me. The pertinent point is that
A field or method R is accessible to a class or interface D if and only if any of the following conditions are true:
- …
- R is package private and is declared by a class in the same runtime package as D.
which is in the JVM specification under Access Control (5.4.4).
- This is a very risky thing to do as you're effectively cutting away the safety net that stops you putting unsanitised data into your document. Ensure that anything you add to this text node that is coming from application users does not contain html tags (in particular ), or you're going to have a very bad time with XSS, CSRF, etc.