I am using HtmlUnit to download URL and the webpage is using lazy loading (I think) to load some of the images. Which settings should I use in HtmlUnit so that I can get those images.
For example, this is one of the URLs I am trying to download-
The product images (after first few) have dummy src value-
As you can see the src tag has dummy value and actual image url is stored in imgurl attribute. I think the webpage uses some javascript to change the src attribute by correct value once we scroll down.
This is my sample code-
webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setAppletEnabled(false);
webClient.getOptions().setDoNotTrackEnabled(true);
webClient.getOptions().setPopupBlockerEnabled(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
Page page = webClient.getPage(url);
I have tried the following-
1) Increase window height-
webClient.getCurrentWindow().setInnerHeight(60000);
webClient.getCurrentWindow().setInnerWidth(60000);
2) Try to scroll down after page is downloaded
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(true);
webClient.waitForBackgroundJavaScript(10 * 1000);
HtmlPage page = (HtmlPage) webClient.getPage(url);
page.getBody().type(KeyboardEvent.DOM_VK_PAGE_DOWN);
Thread.sleep(3000);
String html = page.asXml();
But so far, I have not been able to get the correct src URL. If anyone has successfully fixed this lazy loading issue, please suggest some workarounds.
thank you!