0

I'm trying to modify the bootstrap Javascript that Vaadin sends to the browser. Here's the Vaadin forum link about this issue: https://vaadin.com/forum#!/thread/4252604

Vaadin uses Jsoup so I am using the Jsoup APIs to find the right place in the Vaadin payload to modify the Javascript. When I use Jsoup API like this:

element.html(newHTML)

anything that was in newHTML gets escaped. So, for example, if newHTML was:

alert("hi");

then calling the Jsoup API would result in:

alert("hi");

I thought I could disable this Jsoup escaping by doing something like this:

element.ownerDocument().outputSettings().escapeMode(...)

but ownerDocument() is null so I don't think that's an option. Does Jsoup have a way around this limitation so that I can get Javascript that has double quotes (") and even opening/closing tag brackets (<, >) to get output?

Kevin
  • 177
  • 2
  • 16

2 Answers2

0

apparently,

element.childNode(0).attr("data", html);

does the trick, if element is the "script" element and html is the Javascript source.

Kevin
  • 177
  • 2
  • 16
  • This doesn't work for me. What version of JSoup are you running? I have 1.7.2. The only attribute that has any effect for me on the textnode child of a script element is "text", not "data". Unfortunately, the output is still escaped. I believe the escaping occurs on output - not on setting the node or its attributes. – jr. Nov 03 '13 at 07:28
  • I might add too that I'm not using Vaadin - I just realised that that could be of significance. – jr. Nov 03 '13 at 09:31
  • I'm not sure of the JSoup version, but the Vaadin version is 7.1.7. – Kevin Nov 03 '13 at 13:15
  • Have you confirmed that this worked? Do you have the source of a full method showing the bootstrap modification that you're performing (provided your code is able to be shared)? I've tried messing around using what you've provided here, but I just can't see how it would work with stock JSoup - Vaadin may use a modified version of JSoup. Does JSoup come with Vaadin, or did you grab it yourself? – jr. Nov 04 '13 at 00:56
  • I do have a slightly dodgy workaround that works without Vaadin - I'll post it as an answer later today. – jr. Nov 04 '13 at 00:57
0

My solution was to subclass TextNode and override the method that does the escaping.

package org.jsoup.nodes;

public class UnescapedTextNode extends TextNode
{
    public UnescapedTextNode( final String text, final String baseUri )
    {
        super( text, baseUri );
    }

    @Override
    void outerHtmlHead(
        final StringBuilder accum,
        final int depth,
        final Document.OutputSettings out )
    {
        //String html = Entities.escape( getWholeText(), out ); // Don't escape!
        String html = getWholeText();
        if ( out.prettyPrint() &&
             parent() instanceof Element &&
             !Element.preserveWhitespace( parent() ) )
        {
             html = normaliseWhitespace( html );
        }
        if ( out.prettyPrint() &&
             ( ( siblingIndex() == 0 &&
                 parentNode instanceof Element &&
                 ( (Element)parentNode ).tag().formatAsBlock() &&
                   !isBlank() ) ||
                 ( out.outline() &&
                   siblingNodes().size() > 0 &&
                   !isBlank() ) ) )
        {
            indent( accum, depth, out );
        }
        accum.append( html );
    }
}

This is pretty much a verbatim copy of TextNode.outerHtmlHead() (as originally written by Jonathan Hedley). I've just commented out the escaping part. This is how I used it:

// ... assuming head is of type Element and refers to the <head> of the document.
final String message = "Hello World!";
final String messageScript = "alert( \"" + message + "\" );";
final Element messageScriptEl = head.appendElement( "script" ).
    attr( "type", "text/javascript" );
final TextNode messageScriptTextNode = new UnescapedTextNode(
    messageScript,
    messageScriptEl.baseUri() );
messageScriptEl.appendChild( messageScriptTextNode );
// ... etc

Further on, calling Document.toString() or Document.outerHtml() produces output with the text inside the script tag that was created unescaped. ie:

<script type="text/javascript">alert( "Hello World!" );</script>

instead of:

<script type="text/javascript">alert( &quot;Hello World!&quot; );</script>

as was occurring previously.

There were two 'gotchas' that I found:

  • The UnescapedTextNode class needs to be loaded by the same classloader that loads the original jsoup library. This is because above, I've overrided a package-private method, and that is stipulated in the JLS. (Thanks to Jeff Sinclair for the article that pointed this out to me. The pertinent point is that

    A field or method R is accessible to a class or interface D if and only if any of the following conditions are true:

    • R is package private and is declared by a class in the same runtime package as D.

    which is in the JVM specification under Access Control (5.4.4).

  • This is a very risky thing to do as you're effectively cutting away the safety net that stops you putting unsanitised data into your document. Ensure that anything you add to this text node that is coming from application users does not contain html tags (in particular ), or you're going to have a very bad time with XSS, CSRF, etc.
jr.
  • 1,699
  • 14
  • 31