0

I am using HtmlUnit to download URL and the webpage is using lazy loading (I think) to load some of the images. Which settings should I use in HtmlUnit so that I can get those images.

For example, this is one of the URLs I am trying to download-

http://www.ebay.com.au/sch/i.html?_from=R40&_trksid=p2050601.m570.l1313.TR10.TRC0.A0.H0.Xiphone6s.TRS0&_nkw=iphone6s&_sacat=0

The product images (after first few) have dummy src value-

As you can see the src tag has dummy value and actual image url is stored in imgurl attribute. I think the webpage uses some javascript to change the src attribute by correct value once we scroll down.

This is my sample code-

webClient = new WebClient(BrowserVersion.FIREFOX_38);
webClient.getOptions().setActiveXNative(false);
webClient.getOptions().setAppletEnabled(false);
webClient.getOptions().setDoNotTrackEnabled(true);
webClient.getOptions().setPopupBlockerEnabled(true);
webClient.getOptions().setPrintContentOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setCssErrorHandler(new SilentCssErrorHandler());
Page page = webClient.getPage(url);

I have tried the following-

1) Increase window height-

webClient.getCurrentWindow().setInnerHeight(60000);
webClient.getCurrentWindow().setInnerWidth(60000);

2) Try to scroll down after page is downloaded

webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setCssEnabled(true);
webClient.waitForBackgroundJavaScript(10 * 1000);
HtmlPage page = (HtmlPage) webClient.getPage(url);
page.getBody().type(KeyboardEvent.DOM_VK_PAGE_DOWN);
Thread.sleep(3000);
String html = page.asXml();

But so far, I have not been able to get the correct src URL. If anyone has successfully fixed this lazy loading issue, please suggest some workarounds.

thank you!

user2747986
  • 211
  • 2
  • 5
  • Are you a developer for ebay, or are you using HtmlUnit, a unit testing framework, as a webcrawler? – GolezTrol Dec 29 '15 at 20:02
  • Not a developer for ebay. Writing a webcrawler. Ebay URL is one example. I could not list other URLs here. I was able to quickly find similar lazy loading issue on ebay site, hence posted that URL. – user2747986 Dec 29 '15 at 20:14

0 Answers0