5

I am using HtmlUnit library for Java to manipulate websites programmatically. I can't find the working solution to my problem: How to determine that all AJAX calls are finished and return a completely loaded webpage? Here's what I have tried:

Firstly I create WebClient instance and make call to my method processWebPage(String url, WebClient webClient)

WebClient webClient = null;
    try {
        webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
        webClient.setThrowExceptionOnScriptError(false);
        webClient.setThrowExceptionOnFailingStatusCode(false);
        webClient.setJavaScriptEnabled(true);
        webClient.setAjaxController(new NicelyResynchronizingAjaxController());
    } catch (Exception e) {
        System.out.println("Error");
    }
    HtmlPage currentPage = processWebPage("http://www.example.com", webClient);

And here is my method which should return a completely loaded web page:

private static HtmlPage processWebPage(String url, WebClient webClient) {
    HtmlPage page = null;
    try {
        page = webClient.getPage(url);
    } catch (Exception e) {
        System.out.println("Get page error");
    }
    int z = webClient.waitForBackgroundJavaScript(1000);
    int counter = 1000;
    while (z > 0) {
        counter += 1000;
        z = webClient.waitForBackgroundJavaScript(counter);
        if (z == 0) {
            break;
        }
        synchronized (page) {
            System.out.println("wait");
            try {
                page.wait(500);
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }
    }
    System.out.println(page.asXml());
    return page;
}

That z variable should return 0 if there are no JavaScript left to load.

Any thoughts? Thanks in advance.

EDIT: I found a partially working solution to my problem, but in this case I should know how the response page looks. For example, if a completely loaded page contains text "complete", my solution would be:

HtmlPage page = null;
    int PAGE_RETRY = 10;
    try {
        page = webClient.getPage("http://www.example.com");
    } catch (Exception e) {
        e.printStackTrace();
    }
    for (int i = 0; !page.asXml().contains("complete") && i < PAGE_RETRY; i++) {
        try {
            Thread.sleep(1000 * (i + 1));
            page = webClient.getPage("http://www.example.com");
        } catch (Exception e) {
            e.printStackTrace();
        }

    }

But what would be the solution if I don't know how a completely loaded page looks like?

justasd
  • 401
  • 1
  • 12
  • 26

1 Answers1

5

Try this:

HtmlPage page = null;
try {
    page = webClient.getPage(url);
} catch (Exception e) {
    System.out.println("Get page error");
}
JavaScriptJobManager manager = page.getEnclosingWindow().getJobManager();
while (manager.getJobCount() > 0) {
    Thread.sleep(1000);
}
System.out.println(page.asXml());
return page;
brnfd
  • 464
  • 2
  • 8
  • Sometimes it takes forever to load all scripts but it works, thank you! – justasd Jun 10 '13 at 18:09
  • 2
    Just a remark even if the post i old, I discover problems when you have timers running on your page. So even with the waitforBackground method you are waitin up to the end of the time given in parameter. – Alain BUFERNE Sep 20 '14 at 11:52
  • I've tried this approach on two different pages I was having this problem with. It worked with the first page, but with the second page, the job count doesn't go below 5. Stopping the jobs with manager.stopJob(manager.getEarliestJob().getId()) doesn't help either. Any suggestions? – Jack Sep 22 '16 at 16:06
  • 1
    I have the same problem as @Jack. Job count just does not reach zero on some pages. – Makan Aug 11 '17 at 10:19