13

I am trying to be able to test a website that uses javascript to render most of the HTML. With the HTMLUNIT browser how would you be able to access the html generated by the javascript? I was looking through their documentation but wasn't sure what the best approach might be.

WebClient webClient = new WebClient();
HtmlPage currentPage = webClient.getPage("some url");
String Source = currentPage.asXml();
System.out.println(Source);

This is an easy way to get back the html of the page but would you use the domNode or another way to access the html generated by the javascript?

rush66
  • 151
  • 1
  • 2
  • 5

2 Answers2

11

You gotta give some time for the JavaScript to execute.

Check a sample working code below. The bucket divs aren't in the original source.

import java.io.IOException;
import java.net.MalformedURLException;
import java.util.List;
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.HtmlPage;

public class GetPageSourceAfterJS {
    public static void main(String[] args) throws FailingHttpStatusCodeException, MalformedURLException, IOException {
        java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(java.util.logging.Level.OFF); /* comment out to turn off annoying htmlunit warnings */
        WebClient webClient = new WebClient();
        String url = "http://www.futurebazaar.com/categories/Home--Living-Luggage--Travel-Airbags--Duffel-bags/cid-CU00089575.aspx";
        System.out.println("Loading page now: "+url);
        HtmlPage page = webClient.getPage(url);
        webClient.waitForBackgroundJavaScript(30 * 1000); /* will wait JavaScript to execute up to 30s */

        String pageAsXml = page.asXml();
        System.out.println("Contains bucket? --> "+pageAsXml.contains("bucket"));

        //get divs which have a 'class' attribute of 'bucket'
        List<?> buckets = page.getByXPath("//div[@class='bucket']");
        System.out.println("Found "+buckets.size()+" 'bucket' divs.");

        //System.out.println("#FULL source after JavaScript execution:\n "+pageAsXml);
    }
}

Output:

Loading page now: http://www.futurebazaar.com/categories/Mobiles-Mobile-Phones/cid-CU00089697.asp‌​x?Rfs=brandZZFly001PYXQcurtrayZZBrand
Contains bucket? --> true
Found 3 'bucket' divs.

HtmlUnit version used:

<dependency>
    <groupId>net.sourceforge.htmlunit</groupId>
    <artifactId>htmlunit</artifactId>
    <version>2.12</version>
</dependency>
acdcjunior
  • 132,397
  • 37
  • 331
  • 304
2

Assuming the issue is HTML generated by JavaScript as a result of AJAX calls, have you tried the 'AJAX does not work' section in the HtmlUnit FAQ?

There's also a section in the howtos about how to use HtmlUnit with JavaScript.

If your question isn't answered here, I think we'll need some more specifics to be able to help.

brabster
  • 42,504
  • 27
  • 146
  • 186