1

I'm using Selenium and the HTMLUnit with Javascript enabled to read websites in Python. Unfortunately, I'm running into problems with websites that don't have the cleanest Javascript. For example:

from selenium import webdriver

try:
    browser = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
    browser.get('https://www.ebay.com/')
    browser.close()
    print('success')
except Exception as e:
    print(e)

This results in an error being raised as if python is being passed javascript errors through the webdriver. Note, this does not happen with the Chrome, Firefox, or IE webdrivers.

Exception e:

TypeError: Cannot read property "classList" from undefined (script in https://www.ebay.com/ from (46, 26) to (73, 78)#70)
Stacktrace:
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError (ScriptRuntime.java:4130)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.constructError (ScriptRuntime.java:4108)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError (ScriptRuntime.java:4141)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.typeError2 (ScriptRuntime.java:4160)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.undefReadError (ScriptRuntime.java:4173)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.getObjectProp (ScriptRuntime.java:1528)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1245)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod (NativeArray.java:1671)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall (NativeArray.java:353)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call (IdFunctionObject.java:101)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1484)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.iterativeMethod (NativeArray.java:1671)
at net.sourceforge.htmlunit.corejs.javascript.NativeArray.execIdCall (NativeArray.java:353)
at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call (IdFunctionObject.java:101)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop (Interpreter.java:1484)
at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret (Interpreter.java:815)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call (InterpretedFunction.java:111)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall (ContextFactory.java:417)
at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall (HtmlUnitContextFactory.java:325)
at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall (ScriptRuntime.java:3424)
at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec (InterpretedFunction.java:122)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun (JavaScriptEngine.java:781)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run (JavaScriptEngine.java:895)
at net.sourceforge.htmlunit.corejs.javascript.Context.call (Context.java:599)
at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call (ContextFactory.java:527)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:790)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:766)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute (JavaScriptEngine.java:757)
at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScript (HtmlPage.java:920)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded (HtmlScript.java:316)
at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded (HtmlScript.java:396)
at com.gargoylesoftware.htmlunit.html.HtmlScript$2.execute (HtmlScript.java:246)
at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage (HtmlScript.java:267)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement (HTMLParser.java:805)
at org.apache.xerces.parsers.AbstractSAXParser.endElement (None:-1)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement (HTMLParser.java:761)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.callEndElement (HTMLTagBalancer.java:1236)
at net.sourceforge.htmlunit.cyberneko.HTMLTagBalancer.endElement (HTMLTagBalancer.java:1136)
at net.sourceforge.htmlunit.cyberneko.filters.DefaultFilter.endElement (DefaultFilter.java:226)
at net.sourceforge.htmlunit.cyberneko.filters.NamespaceBinder.endElement (NamespaceBinder.java:345)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scanEndElement (HTMLScanner.java:3178)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner$ContentScanner.scan (HTMLScanner.java:2141)
at net.sourceforge.htmlunit.cyberneko.HTMLScanner.scanDocument (HTMLScanner.java:945)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse (HTMLConfiguration.java:521)
at net.sourceforge.htmlunit.cyberneko.HTMLConfiguration.parse (HTMLConfiguration.java:472)
at org.apache.xerces.parsers.XMLParser.parse (None:-1)
at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse (HTMLParser.java:1004)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parse (HTMLParser.java:253)
at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml (HTMLParser.java:195)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage (DefaultPageCreator.java:267)
at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage (DefaultPageCreator.java:158)
at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto (WebClient.java:524)
at com.gargoylesoftware.htmlunit.WebClient.getPage (WebClient.java:398)
at com.gargoylesoftware.htmlunit.WebClient.getPage (WebClient.java:315)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.get (HtmlUnitDriver.java:670)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.lambda$get$8 (HtmlUnitDriver.java:657)
at org.openqa.selenium.htmlunit.HtmlUnitDriver.lambda$runAsync$0 (HtmlUnitDriver.java:414)
at java.lang.Thread.run (None:-1)

I have found the following for Java which looks like it should work:

WebClient client = new WebClient();
client.getOptions().setThrowExceptionOnScriptError(false);

I cannot figure out how to implement this in Python, any advice?

A Gregory
  • 21
  • 6

1 Answers1

1

It would appear that an implementation of a custom error handler solves the problem, for example:

from selenium import webdriver
from selenium.webdriver.remote.errorhandler import ErrorHandler

class MyHandler(ErrorHandler):
    def check_response(self, response):
        try:
            super(MyHandler, self).check_response(response)
        except Exception as e:
            pass

try:
    browser = webdriver.Remote(desired_capabilities=webdriver.DesiredCapabilities.HTMLUNITWITHJS)
    browser.error_handler = MyHandler()
    browser.get('https://www.ebay.com/')
    browser.close()
    print('success')
except Exception as e:
    print(e)
A Gregory
  • 21
  • 6