1

I'm using Selenium for Java and I'm having problems with the HTMLUnitDriver. No matter which website I try or dependencies, it just crashes on almost any JavaScript according to the console output. When I use PhantomJS instead, it's all good and stuff works just like it does with e.g. Chrome or Firefox. Also, I'm not sure which dependencies I'm supposed to use for HTMLUnitDriver.

The following is supposed to give me the latest version of the HTMLUnitDriver:

<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-java</artifactId>
    <version>3.5.3</version>
    <exclusions>
        <exclusion>
            <groupId>org.sourceforge.htmlunit</groupId>
            <artifactId>htmlunit</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.sourceforge.htmlunit</groupId>
            <artifactId>htmlunit-core</artifactId>
        </exclusion>
        <exclusion>
            <groupId>org.sourceforge.htmlunit</groupId>
            <artifactId>neko-htmlunit</artifactId>
        </exclusion>
    </exclusions>
</dependency>

<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>selenium-htmlunit-driver</artifactId>
    <version>2.52.0</version>
</dependency>

However, it doesn't. HTMLUnitDriver seems to be bundled with net.sourceforge.htmlunit:htmlunit:2.27, net.sourceforge.htmlunit:htmlunit-core-js:2.27 and net.sourceforge.htmlunit:neko-htmlunit:2.27 despite the exclude.

This repository however suggests that 2.27 is still the latest but it handles any kind of JavaScript on websites very poorly so it's unusable.

This is how I start it:

HtmlUnitDriver unitDriver = new HtmlUnitDriver();
unitDriver.setJavascriptEnabled(true);

Exception:

Caused by: com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function registerElement in object [object HTMLDocument]. (https://www.example.com/some-script.js#31)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:894)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:637)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:518)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:774)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:750)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:102)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.loadExternalJavaScriptFile(HtmlPage.java:991)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:366)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute(HtmlScript.java:247)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:268)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:800)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:756)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1236)
    at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1136)
    at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement(DefaultFilter.java:226)
    at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement(NamespaceBinder.java:345)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3178)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2141)
    at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument(HTMLScanner.java:945)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:521)
    at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse(HTMLConfiguration.java:472)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:999)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:250)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:192)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:272)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:160)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:522)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:396)
    at com.gargoylesoftware.htmlunit.WebClient.getPage(WebClient.java:313)
    at org.openqa.selenium.htmlunit.HtmlUnitDriver.get(HtmlUnitDriver.java:668)
    ... 3 more

Not enabling JavaScript works better in terms of avoiding the exception but the site needs JavaScript so that's not a solution.

Is there anything wrong with my dependencies or is HTMLUnitDriver really just "garbage"? The startup time of PhantomJS is about 5 seconds which is pretty slow if you just want to parse something once so a more lightweight driver like HTMLUnitDriver would come in handy if it worked...

BullyWiiPlaza
  • 17,329
  • 10
  • 113
  • 185

3 Answers3

1

Please note the change of artifactId, the latest version as hinted here is:

<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>htmlunit-driver</artifactId>
    <version>2.27</version>
</dependency>

which is based on Selenium 3.4.0.

You can use Selenium 3.6.0 with HtmlUnitDriver 2.28-SNAPSHOT.

I suggest as a starting point, that you reference HtmlUnitDriver only, and it will get all depdencies transitively, and then you can all other drivers.

Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
  • Well, the readme hasn't been updated in 5 months and I'm already using `Selenium` with other drivers so I can't just remove it. I need `Selenium 3.5.3` due to a bug with `PhantomJSDriver`. – BullyWiiPlaza Oct 30 '17 at 21:06
  • Selenium `3.5.3` already includes HtmlUnit `2.27`, which is the latest released version. Regarding the JavaScript support, please read [here](http://htmlunit.sourceforge.net/submittingJSBugs.html) – Ahmed Ashour Oct 31 '17 at 06:20
0

Documentation suggests that from Selenium v2.53.0 onwards you need to explicitly include HtmlUnitDriver as a dependency to include it in your Selenium-Maven Project. Version number of the driver will now track HtmlUnit itself.

Example :

<dependency>
    <groupId>org.seleniumhq.selenium</groupId>
    <artifactId>htmlunit-driver</artifactId>
    <version>3.6.0</version>
</dependency>  
undetected Selenium
  • 183,867
  • 41
  • 278
  • 352
  • Yeah but version `3.6.0` for instance doesn't exist. `LATEST` turns into `2.27` and that version is still really bad like explained in the question. – BullyWiiPlaza Oct 30 '17 at 10:39
0

Errors like

com.gargoylesoftware.htmlunit.ScriptException: TypeError: Cannot find function registerElement in object [object HTMLDocument].

are usually the result of missing support for a special javascript functionality in HtmlUnit. If you like to see this fixed you have to

And of course unit tests and/or patches are welcome.

RBRi
  • 2,704
  • 2
  • 11
  • 14
  • Not really, if the makers just use their `HTMLUnitDriver` on real life websites they would see plenty of issues on their own. They had years to support `JavaScript` properly. `PhantomJSDriver` and all modern browsers work fine so it can't be this hard to make things work. Still thanks for letting me know what you can do about it. – BullyWiiPlaza Oct 31 '17 at 21:25