7

I cannot, for the life of me, rig HtmlUnit up to grab this site:

http://www.bing.com/travel/flight/flightSearch?form=FORMTRVLGENERIC&q=flights+from+SLC+to+BKK+leave+07%2F30%2F2010+return+08%2F11%2F2010+adults%3A1+class%3ACOACH&stoc=0&vo1=Salt+Lake+City%2C+UT+%28SLC%29+-+Salt+Lake+City+International+Airport&o=SLC&ve1=Bangkok%2C+Thailand+%28BKK%29+-+Suvarnabhumi+International&e=BKK&d1=07%2F30%2F2010&r1=08%2F11%2F2010&p=1&b=COACH&baf=true

I'm sure it has to do with the vast amounts of scripts running in the background. Perhaps these scripts aren't being given enough time to fully load?

I've also tried simply grabbing bing.com/travel, and no success either. It's breaking on the getPage function of the new HtmlPage client.

The output gives a plethora of runtimeErrors ("data necessary to complete this operation is not yet available"), all for the same sourceName ("http://www.bing.com/travel/jsxc.vjs?a=common&v=5.5.0-1278007084280")

Then a couple exceptions thrown for a missing "(" in a couple scripts on bing.com.

Then it calls javascript, then abruptly ends.

I realize this could be a handful of problems that others might not be able to see, and so if there are no suggestions, would someone mind pumping these two sites through a test implementation of their own HtmlUnit use and see if they can get basic output of the XML or text results? I'm not trying to do anything fancy here, just get some basic text or XML output of the results.

It'd be handy to know if someone else's implementation works so I can keep jury-rigging mine to completion.

CODE:

import com.gargoylesoftware.htmlunit.html.HtmlPage;
import com.gargoylesoftware.htmlunit.WebClient;

public class test {

public static void main(String[] args) throws Exception {

        WebClient client = new WebClient();
        System.out.println("webclient loaded");

        HtmlPage currentPage = client.getPage("http://www.bing.com/travel/flight/flightSearch?form=FORMTRVLGENERIC&q=flights+from+SLC+to+BKK+leave+07%2F30%2F2010+return+08%2F11%2F2010+adults%3A1+class%3ACOACH&stoc=0&vo1=Salt+Lake+City%2C+UT+%28SLC%29+-+Salt+Lake+City+International+Airport&o=SLC&ve1=Bangkok%2C+Thailand+%28BKK%29+-+Suvarnabhumi+International&e=BKK&d1=07%2F30%2F2010&r1=08%2F11%2F2010&p=1&b=COACH&baf=true");
        client.waitForBackgroundJavaScript(10000);
        System.out.println("htmlpage init'd");

        //System.out.println(currentPage.getTitleText());
        String textSource = currentPage.asXml();
        System.out.println(textSource);

}

}

Thanks!

Stu Kalide
  • 121
  • 1
  • 2

3 Answers3

3

Try adding this:

client.setThrowExceptionOnScriptError( false ) ;

It takes a long time to run, and boy does it spew out logging... but eventually a page came out:

htmlpage init'd
<?xml version="1.0" encoding="utf-8"?>
<html id="">
  <head>
   ...
Rodney Gitzel
  • 2,652
  • 16
  • 23
  • well son of a gun... thanks! so is it worth going through to fix the errors and warnings? as long as I get a page out, maybe it's not worth the effort... – Stu Kalide Jul 31 '10 at 01:42
  • From what I recall a lot of it was just logging info. That's typical of my HtmlUnit tests, the console spews like crazy. If the page comes out, don't worry about it. – Rodney Gitzel Jul 31 '10 at 03:19
  • I just want to confirm that adding that line above really DOES work. I've been having the same problem too--getting an error that says I'm loading an obsolete JS content during page load. Then eventually, during automated form submission, the error is that the JS content isn't available yet. The same logs still appear, but at least the RuntimeException that gets thrown and stops the entire execution is gone. However, I think that's only because the JS that was being loaded isn't necessary for me to complete form input and submission in the first place. – Matthew Quiros Jul 29 '12 at 07:56
2

I also had the problem with "data necessary to complete this operation is not yet available".
Switching the user-agent to "Firefox" helped...
http://steveliles.github.com/jquery_htmlunit_runtimeerror_messages_galore.html

Alexander Link
  • 131
  • 1
  • 3
2

Browsers have a high tolerance for what they might detect as errors (in Javascript, but also HTML, css and so on). This is partly because of various conflicting "standards" :) of how Javascript got implemented. Something that appears OK on one browser gets problems on another. So when all these messages are made visible it should be a little disconcerting.

To put this in perspective - in Internet Explorer go into your settings and check the "Advanced Settings" for "Display a notification about every script error" and then browse the same sites. You might be surprised at how much code IE gets by just ignoring what it might detect as problems.

Using HtmlUnit under various browsers just brings some of these conflicts to light.

Telling HtmlUnit to do something like "Ignore...for this browser" is a perfectly valid practice. In my case, I am bringing in data from a site that checks that all the users are using Internet Explorer (No, I have no good idea why they do that), so I can't proceed without ignoring the javascript errors. Interestingly, the site works fine even though IE thinks there're lots of Javascript errors.

Pete Kelley
  • 3,713
  • 2
  • 16
  • 17