I am having hard time with my NekoHTML parser. It is working fine on URL's but when I want to test in on a simple XML test, it does not read it properly.
Here is how I declare it:
def createAndSetParser() {
SAXParser parser = new SAXParser() //Default Sax NekoHTML parser
def charset = "Windows-1252" // The encoding of the page
def tagFormat = "upper" // Ensures all the tags and consistently written, by putting all of them in upper-case. We can choose "lower", "upper" of "match"
def attrFormat = "lower" // Same thing for attributes. We can choose "upper", "lower" or "match"
Purifier purifier = new Purifier() //Creating a purifier, in order to clean the incoming HTML
XMLDocumentFilter[] filter = [purifier] //Creating a filter, and adding the purifier to this filter. (NekoHTML feature)
parser.setProperty("http://cyberneko.org/html/properties/filters", filter)
parser.setProperty("http://cyberneko.org/html/properties/default-encoding", charset)
parser.setProperty("http://cyberneko.org/html/properties/names/elems", tagFormat)
parser.setProperty("http://cyberneko.org/html/properties/names/attrs", attrFormat)
parser.setFeature("http://cyberneko.org/html/features/scanner/ignore-specified-charset", true) // Forces the parser to use the charset we provided to him.
parser.setFeature("http://cyberneko.org/html/features/override-doctype", false) // To let the Doctype as it is.
parser.setFeature("http://cyberneko.org/html/features/override-namespaces", false) // To make sure no namespace is added or overridden.
parser.setFeature("http://cyberneko.org/html/features/balance-tags", true)
return new XmlSlurper(parser) // A groovy parser that does not download the all tree structure, but rather supply only the information it is asked for.
}
Again it is working very fine when I use it on websites. Any guess why I cannot do so on simple XML text samples ??
Any help greatly apreciated :)