5

I am writing a program that needs to read data from an input text file, and save a variables upon going through the date. I am using Htmlunit, and am running into the error:

com.gargoylesoftware.htmlunit.ScriptException: Exception invoking open
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine$HtmlUnitContextAction.run(JavaScriptEngine.java:684)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:602)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:616)
    at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.callFunction(JavaScriptEngine.java:591)
    at com.gargoylesoftware.htmlunit.html.HtmlPage.executeJavaScriptFunctionIfPossible(HtmlPage.java:985)
    at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeEventHandler(EventListenersContainer.java:210)
    at com.gargoylesoftware.htmlunit.javascript.host.EventListenersContainer.executeBubblingListeners(EventListenersContainer.java:230)
    at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:804)
    at com.gargoylesoftware.htmlunit.javascript.host.Node.fireEvent(Node.java:738)
    at com.gargoylesoftware.htmlunit.html.HtmlElement$1.run(HtmlElement.java:869)
    at net.sourceforge.htmlunit.corejs.javascript.Context.call(Context.java:602)
    at net.sourceforge.htmlunit.corejs.javascript.ContextFactory.call(ContextFactory.java:507)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.fireEvent(HtmlElement.java:874)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.doClickFireClickEvent(HtmlElement.java:1311)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1253)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1205)
    at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1160)
    at Docketscraper.scrapeWebsite(Docketscraper.java:58)
    at Docketscraper.starter(Docketscraper.java:40)

My code to do this is:

  private static String startingMonth;
  private static String startingDay;
  private static String startingYear;
  private static String endingMonth;
  private static String endingDay;
  private static String endingYear;

  public static void starter() throws IOException{
    Scanner sc = new Scanner("inputfile.txt").useDelimiter("\\s*|/");
    while(sc.hasNext()) {
      startingMonth = sc.next();
      startingDay = sc.next();
      startingYear = sc.next();
      // skip "to"
      sc.next();
        endingMonth = sc.next();
      endingDay = sc.next();
      endingYear = sc.next();
      scrapeWebsite();
    }
  }

where scrapeWebsite runs the htmlunit method. The scrapeWebsite method is as follows which calls a method to parse through the data:

public static void scrapeWebsite() throws IOException {


    final WebClient webClient = new WebClient();
    final HtmlPage page = webClient.getPage(url);
    final HtmlForm form = page.getForms().get(0);
    final HtmlElement button = form.getElementById("SheetContentPlaceHolder_C_searchresults_lbPrint");
    final HtmlPage page2 = button.click();
    try {
      synchronized (page2) {
        page2.wait(10000);
      }
    }
    catch(InterruptedException e)
    {
      System.out.println("error");
    }
    originalHtml = page2.getWebResponse().getContentAsString();
    obtainInformation();  
    originalHtml = "";
  }

The input variables create the url to search based upon the code:

private static String url = "http://cpdocket.cp.cuyahogacounty.us/SheriffSearch/results.aspx?q=searchType%3dCity%26searchString%3d%26foreclosureType%3d%26dateFrom%3d" + startingMonth + "%2f" + startingDay + "%2f" + startingYear + "+12%3a00%3a00+AM%26dateTo%3d" + endingMonth + "%2f" + endingDay + "%2f" + endingYear + "+11%3a59%3a59+PM";

which is the url specific to the website. I believe it is a an issue with the scanner method because when I manually input numbers for the 6 variables and run "scrapeWebsite", the correct output occurs. I can not get even one input set of dates to run which are in the format:

1/1/2013 to 1/7/2013

I am not sure what is the problem with the "starter" method

Ctech45
  • 496
  • 9
  • 17

2 Answers2

0

There is no issue with the Scanner itself. It is only a matter of getting not well-formed javascript in the request. HtmlUnit tries to parse it and it fails throwing an exception.

I've added to this answer some ideas on this issue.

Apart from that you could always suppress the exception with setThrowExceptionOnScriptError(false).

This will get you through the exception but will not correct any issue in the javascript code. If the javascript function that is giving you trouble happens to be a critical part in your data extraction process then you will have no other choice than forgetting about Javascript handled by HtmlUnit and start coding yourself the AJAX requests. On the other hand, if the Javascript function doesn't have anything to do with the actual processing that you need then this is most likely to work.

This issue is very common when it comes to webscraping in HtmlUnit.

Community
  • 1
  • 1
Mosty Mostacho
  • 42,742
  • 16
  • 96
  • 123
0

The first problen i noticed was the line

Scannersc = new Scanner("inputfile.txt").useDelimiter("\\s*|/");

The scanner is now reading the text "inputfile.txt".so try replace it by new File("inputfile.txt"); but if you use this class in another class, it is better to have the whole directory eg"\C:\programdata\Connors file\inputfile.txt" this is an example so the easiest thing to get the directory is go to the folder it is in, right click on the file properties and copy the directory and add\inputfile.txt. Please let me know if this helps.

Ps. In case of emergancy Scanner sc = new Scanner(new File("inputfile.txt")).useDelimiter("delimiter");

kyle england
  • 118
  • 10