I am trying to extract data for a class project from a webpage (a page that shows search results). Specifically, it's this page:
I just want to extract the titles of the products.
I'm using the following code:
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage(itemPageURL);
int tries = 20; // Amount of tries to avoid infinite loop
while (tries > 0) {
tries--;
synchronized(page) {
page.wait(2000); // How often to check
}
}
int numThreads = webClient.waitForBackgroundJavaScript(1000000l);
PrintWriter pw = new PrintWriter("test-target-search.txt");
pw.println(page.asXml());
pw.close();
The page that results does not have the product information that's shown on the web browser. I imagine the AJAX calls haven't completed? (not sure though.)
Any help would greatly be appreciated. Thanks!