0

I'm currently trying to acces a Webpage in Javacode using HtmlUnit. The Page has a Button, which opens a new Page when clicked. But when I try to click it, the compiler throws this exception which can be found in the attached image. As far as I understand, it has something to do with illegal escape sequences in the page's Html code.

Here is my code so far:

try(WebClient client = new WebClient(BrowserVersion.CHROME)){

    client.getOptions().setCssEnabled(false);
    WebRequest webRequest = new WebRequest(url);
    webRequest.setCharset("utf-8");
    HtmlPage entrypage = client.getPage(webRequest);
    HtmlInput dwnld = (HtmlInput) entrypage.getElementById("btn_download");

    long millis =  System.currentTimeMillis();

    while (System.currentTimeMillis() <= millis+11000) {
        //Do nothing, just wait 11 seconds
    }

    if (dwnld != null) {
        System.out.println("Found btn_download");
        dwnld.click();
    }


} catch (FailingHttpStatusCodeException | IOException e ) {
    // TODO Auto-generated catch block

    e.printStackTrace();
}

Ideas anyone?

Here's the exception:

java.util.regex.PatternSyntaxException: Illegal octal escape sequence near index 2
\0+$
  ^
    at java.util.regex.Pattern.error(Pattern.java:1955)
    at java.util.regex.Pattern.o(Pattern.java:3192)
    at java.util.regex.Pattern.escape(Pattern.java:2300)
    at java.util.regex.Pattern.atom(Pattern.java:2198)
    at java.util.regex.Pattern.sequence(Pattern.java:2079)
    at java.util.regex.Pattern.expr(Pattern.java:1996)
    at java.util.regex.Pattern.compile(Pattern.java:1696)
    at java.util.regex.Pattern.<init>(Pattern.java:1351)
    at java.util.regex.Pattern.compile(Pattern.java:1054)
    at com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.doAction(HtmlUnitRegExpProxy.java:102)
    at com.gargoylesoftware.htmlunit.javascript.regexp.HtmlUnitRegExpProxy.action(HtmlUnitRegExpProxy.java:74)
    at net.sourceforge.htmlunit.corejs.javascript.NativeString.execIdCall(NativeString.java:455)
    at net.sourceforge.htmlunit.corejs.javascript.IdFunctionObject.call(IdFunctionObject.java:89)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpretLoop(Interpreter.java:1531)
    at net.sourceforge.htmlunit.corejs.javascript.Interpreter.interpret(Interpreter.java:798)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.call(InterpretedFunction.java:105)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.doTopCall(ContextFactory.java:411)
    at com.gargoylesoftware.htmlunit.javascript.HtmlUnitContextFactory.doTopCall(HtmlUnitContextFactory.java:309)
    at net.sourceforge.htmlunit.corejs.javascript.ScriptRuntime.doTopCall(ScriptRuntime.java:3057)
    at net.sourceforge.htmlunit.corejs.javascript.InterpretedFunction.exec(InterpretedFunction.java:115)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$3.doRun(JavaScriptEngine.java:724)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:832)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:620)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:513)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:733)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.execute(JavaScriptEngine.java:708)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptIfPossible(HtmlPage.java:982)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeInlineScriptIfNeeded(HtmlScript.java:351)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.executeScriptIfNeeded(HtmlScript.java:411)
    at com.gargoylesoftware.htmlunit.html.HtmlScript$3.execute(HtmlScript.java:276)
    at com.gargoylesoftware.htmlunit.html.HtmlScript.onAllChildrenAddedToPage(HtmlScript.java:290)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:793)
    at org.apache.xerces.parsers.AbstractSAXParser.endElement(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.endElement(HTMLParser.java:751)
    at org.cyberneko.html.HTMLTagBalancer.callEndElement(HTMLTagBalancer.java:1170)
    at org.cyberneko.html.HTMLTagBalancer.endElement(HTMLTagBalancer.java:1072)
    at org.cyberneko.html.filters.DefaultFilter.endElement(DefaultFilter.java:206)
    at org.cyberneko.html.filters.NamespaceBinder.endElement(NamespaceBinder.java:330)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scanEndElement(HTMLScanner.java:3126)
    at org.cyberneko.html.HTMLScanner$ContentScanner.scan(HTMLScanner.java:2093)
    at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:920)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
    at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
    at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
    at com.gargoylesoftware.htmlunit.html.HTMLParser$HtmlUnitDOMBuilder.parse(HTMLParser.java:1017)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parse(HTMLParser.java:248)
    at com.gargoylesoftware.htmlunit.html.HTMLParser.parseHtml(HTMLParser.java:194)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createHtmlPage(DefaultPageCreator.java:268)
    at com.gargoylesoftware.htmlunit.DefaultPageCreator.createPage(DefaultPageCreator.java:156)
    at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseInto(WebClient.java:471)
    at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2110)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:875)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:962)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1327)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1270)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1218)
    at src.Hosts$1.exctractFileLinkFrom(Hosts.java:44)
    at src.TestMain.main(TestMain.java:10)
Halvor Holsten Strand
  • 19,829
  • 17
  • 83
  • 99
PooBucket
  • 63
  • 1
  • 1
  • 7

1 Answers1

0

Possible solution?!

It is possible that the Error here does not lie in the frameworks Htmlparsing. My sugggestion is, that not the HtmlUnit framework itself is unable to parse illegal escape sequences but it's logger might be.

I didn't intended to solve the problem this way, but when I changed the loggers level to SEVERE to clean up my console output, there is no such exception being thrown.

Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(java.util.logging.Level.SEVERE);
Logger.getLogger("org.apache.http").setLevel(java.util.logging.Level.SEVERE);

Is my suggestion here correct or is this just coincidence?

PooBucket
  • 63
  • 1
  • 1
  • 7