0

I'm using HtmlUnit to parse html with js code. The structure of the page is(using Chrome Developer Tools): enter image description here

And my code is as follow:

    WebClient wc=new WebClient(BrowserVersion.INTERNET_EXPLORER_11);
    wc.getOptions().setUseInsecureSSL(true);
    wc.getOptions().setJavaScriptEnabled(true);
    wc.getOptions().setCssEnabled(false);
    wc.getOptions().setThrowExceptionOnScriptError(false);
    wc.getOptions().setTimeout(10000);
    wc.getOptions().setDoNotTrackEnabled(false);
    HtmlPage page= wc.getPage(address);
    List<HtmlDivision> items=(List<HtmlDivision>)page.getByXPath(
            "/html/body/div[@id='wrapper']/div[@class='content_main']/div[@class='search_result']/div[@id='resultData']");
    System.out.println(items);
    if(items!=null && items.size()>0){
        HtmlDivision resultMain=items.get(0);
        List<HtmlDivision> appDivList=(List<HtmlDivision>)resultMain.getByXPath(".//div[contains(@class,'search_one')]");
        System.out.println(appDivList);
        for(HtmlDivision resultItem:appDivList){
            try{
                DomElement appImgInfo=resultItem.getFirstElementChild();
                List<HtmlDivision> appInfoList=(List<HtmlDivision>)resultItem.getByXPath("./div[@class='one_right']");
                String appName=null;

The problem is when i debug this code, it works fine. When i run this code,

List<HtmlDivision> appDivList=(List<HtmlDivision>)resultMain.getByXPath(".//div[contains(@class,'search_one')]");

doesn't work,that is appDivList is empty, but when i debug this code, appDivListis not empty. Anyone know why?


Update:

I add some Thread.sleep code before

  List<HtmlDivision> appDivList=(List<HtmlDivision>)resultMain.getByXPath(".//div[contains(@class,'search_one')]");

The updated code is:

        HtmlDivision resultMain=items.get(0);
        try{
        Thread.sleep(10000);
        }catch(Exception e){}
        List<HtmlDivision> appDivList=(List<HtmlDivision>)resultMain.getByXPath(".//div[contains(@class,'search_one')]");
        System.out.println(appDivList);

It works! How does this happen?

chou
  • 344
  • 3
  • 17
  • Maybe version issue, Did U confirmed it in newest version ??? – SkorpEN Apr 20 '16 at 11:42
  • I update the HtmlUnit to the latest version 2.21, but it still doesn't work without the Sleep code – chou Apr 21 '16 at 06:55
  • Did You tried it with css selector ??? – SkorpEN Apr 21 '16 at 11:56
  • I try css selector, and my code is: DomNodeList appDivList=resultMain.querySelectorAll("div.search_one"); However, it still doesn't work without the Thread.sleep code, and if add the Thread.sleep code, it works. – chou Apr 22 '16 at 01:32
  • Does it work in other browser without sleep ??? – SkorpEN Apr 22 '16 at 07:34
  • I try all the browser that contained in the newest htmlUnit. Here is my result: only CHROME works fine without Thread.sleep code while BEST_SUPPORTED,EDGE,FIREFOX_38,FIREFOX_45,INTERNET_EXPLORER,INTERNET_EXPLORER_11 can't work. But all browser work with Thread.sleep code. – chou Apr 24 '16 at 02:22
  • Did You searched in known htmlunit issues ??? – SkorpEN Apr 25 '16 at 11:34

0 Answers0