0

I started to wrote some scripts in groovy. I wrote this script which basically parses an html page and does something with the data.

Now, I use HTTPBuilder to perform the http request. Whenever I try to execute this kind of request, I get this error:

Caught: java.lang.IllegalAccessError: tried to access class groovyx.net.http.StringHashMap from class groovyx.net.http.HTTPBuilder
java.lang.IllegalAccessError: tried to access class groovyx.net.http.StringHashMap from class groovyx.net.http.HTTPBuilder
    at groovyx.net.http.HTTPBuilder.<init>(HTTPBuilder.java:177)
    at groovyx.net.http.HTTPBuilder.<init>(HTTPBuilder.java:218)
    at Main$_main_closure1.doCall(Main.groovy:30)
    at Main.main(Main.groovy:24)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:143)

Here is the code of the main class:

// Grap HTTPBuilder component from maven repository
@Grab(group='org.codehaus.groovy.modules.http-builder',
        module='http-builder', version='0.5.2')
// import of HttpBuilder related stuff
import groovyx.net.http.*
import parsers.Parser
import parsers.WuantoParser
import parsers.Row

class Main {

    static mapOfParsers = [:]
    static void main(args) {
        List<Row> results = new ArrayList<>()

        // Initiating the parsers for the ebay-keywords websites
        println "Initiating Parsers..."
        initiateParsers()

        println "Parsing Websites..."
        mapOfParsers.each { key, parser ->
            switch (key) {
                case Constants.Parsers.WUANTO_PARSER:
                    println "Parsing Url: $Constants.Url.WUANTO_ROOT_CAT_URL"
                    println "Retrieving Html Content..."

                    def http = new HTTPBuilder(Constants.Url.WUANTO_ROOT_CAT_URL)
                    def html = http.get([:])

                    println "Parsing Html Content..."

                    results.addAll(((Parser) parser).parseHtml(html))
                    break
            }
        }

        results.each {
            println it
        }
    }

    static void initiateParsers() {
        mapOfParsers.put(Constants.Parsers.WUANTO_PARSER , new WuantoParser())
    }

    static void writeToFile(List<Row> rows) {
        File file = "output.txt"

        rows.each {
            file.write it.toString()
        }
    }

}
David Lasry
  • 829
  • 3
  • 12
  • 31

1 Answers1

0

Well let's see here. I tried running the code in your snippet but the http builder dependency version 0.5.2 is quite outdated and was not accessible in the repositories my groovy script was pointing at. So I replaced it with a more recent version, 0.7.1.

Also, the html variable value returned from the http.get in your code is actually a parsed format. I.e. it is not text but rather a groovy NodeChild object. This is because the http builder by default does html parsing and you have to explicitly tell it to return plain text if so required (and even then it returns a reader rather than text).

The following somewhat restructured and rewritten version of your code demonstrates the idea:

// Grap HTTPBuilder component from maven repository
@Grab('org.codehaus.groovy.modules.http-builder:http-builder:0.7.1')

import groovyx.net.http.*
import groovy.xml.XmlUtil
import static groovyx.net.http.ContentType.*

class MyParser { 
  def parseHtml(html) {
    [html]
  }
}


def mapOfParsers = [:]
mapOfParsers["WUANTO_PARSER"] = new MyParser()

result = []
mapOfParsers.each { key, parser ->
    switch (key) {
        case "WUANTO_PARSER":
            // just a sample url which returns some html data
            def url = "https://httpbin.org/links/10/0"

            def http = new HTTPBuilder(url)
            def html = http.get([:])

            // the object returned from http.get is of type 
            // http://docs.groovy-lang.org/latest/html/api/groovy/util/slurpersupport/NodeChild.html
            // this is a parsed format which is navigable in groovy 
            println "extracting HEAD.TITLE text: " + html.HEAD.TITLE.text()

            println "class of returned object ${html.getClass().name}"
            println "First 100 characters parsed and formatted:\n ${XmlUtil.serialize(html).take(100)}"

            // forcing the returned type to be plain text
            def reader = http.get(contentType : TEXT)

            // what is returned now is a reader, we can get the text in groovy 
            // via reader.text
            def text = reader.text
            println "Now we are getting text, 100 first characters plain text:\n ${text.take(100)}"

            result.addAll parser.parseHtml(text)
            break
    }
}

result.each { 
    println "result length ${it.length()}"
}

running the above prints:

extracting HEAD.TITLE text: Links
class of returned object groovy.util.slurpersupport.NodeChild
First 100 characters parsed and formatted:
 <?xml version="1.0" encoding="UTF-8"?><HTML>
  <HEAD>
    <TITLE>Links</TITLE>
  </HEAD>
  <BODY>0 <
Now we are getting text, 100 first characters plain text:
 <html><head><title>Links</title></head><body>0 <a href='/links/10/1'>1</a> <a href='/links/10/2'>2</
result length 313

(with a couple of warnings from XmlUtil.serialize omitted for brevity).

None of this explains why you are getting the exception you are getting, but perhaps the above can get you unlocked and past the issue.

Matias Bjarland
  • 4,124
  • 1
  • 13
  • 21