0

Following the instructions in the Saxonica documentation I have code that works great for opening an XML file.

But when I use that same code to open a JSON file I get:

Caused by: net.sf.saxon.s9api.SaxonApiException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:360)
    at net.windward.datasource.xml.SaxonDataSource.ctor(SaxonDataSource.java:231)
    at net.windward.datasource.xml.SaxonDataSource.<init>(SaxonDataSource.java:154)
    ... 2 more
Caused by: net.sf.saxon.trans.XPathException: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:459)
    at net.sf.saxon.event.Sender.send(Sender.java:142)
    at net.sf.saxon.Configuration.buildDocumentTree(Configuration.java:4184)
    at net.sf.saxon.s9api.DocumentBuilder.build(DocumentBuilder.java:357)
    ... 4 more
Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
    at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
    at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
    at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1472)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(XMLDocumentScannerImpl.java:994)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:602)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:505)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:841)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:770)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
    at net.sf.saxon.event.Sender.sendSAXSource(Sender.java:439)
    ... 7 more

What do I need to do differently?

David Thielen
  • 28,723
  • 34
  • 119
  • 193

1 Answers1

0

You haven't told us what you're doing, so it's hard to say what you should do differently. But clearly you're supplying a JSON file to an interface that expects XML.

The simplest way to load json is using the json-doc() function,.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • In my Java code where/how do I call the json-doc() method? Do you have an example anywhere similiar to the example at http://www.saxonica.com/documentation/index.html#!xpath-api/s9api-xpath for the code to load an XML file? – David Thielen Jul 17 '20 at 21:04
  • json-doc() is an XPath function. If you want to call it from Java you could do that by evaluating XPath. In Saxon you could also try calling the static method `net.sf.saxon.ma.json.ParseJsonFn.parse()` but you would have to work out how to set up the parameters. (Note: the javadoc for the method is wrong, the result will typically be a map or array, not an XML Element). – Michael Kay Jul 18 '20 at 06:49
  • You could also use `XdmFunctionItem.getSystemFunction() ` to get the function as an `XdmFunctionItem`, and then `XdmFunctionItem.call()` to invoke it. – Michael Kay Jul 18 '20 at 06:56
  • Ok, I may have made a giant incorrect assumption. For an XML InputStream (file) I follow the suggested code to load it and when done, I have an XdmNode that is the root of the entire XML and an XPathCompiler I can use to run queries against the xml. I assumed JSON was the same, that I can read a JSON InputStream (file) in somehow and get an XdmNode & XPathCompiler. Is that not correct? And in that case, what exactly is "JSON support" in the context of XPath 3.1? – David Thielen Jul 18 '20 at 14:01
  • The s9api API provides a DocumentBuilder allowing you too parse lexical XML and build an XdmNode representing the parsed tree. It doesn't offer an equivalent for parsing JSON - that can only be done from within XPath. Omission noted. This is a gap in the s9api API, not in XPath. – Michael Kay Jul 18 '20 at 14:39
  • @DavidThielen, JSON in XPath 3.1 is represented by XdmMap or XdmArray, not by an XdmNode. If you expect JSON to be represented as an XdmNode you would first need to use `json-to-xml` on the JSON as a string, https://www.w3.org/TR/xpath-functions/#func-json-to-xml. I don't think the s9api offers an way to do that other than using XPath to evaluate e.g. `json-to-xml($json)` where you set the `$json` variable to the JSON string or to evaluate `json-to-xml(unparsed-text($json-uri))` where you set the `$json-uri` to the location of the JSON file. – Martin Honnen Jul 18 '20 at 17:40
  • Michael now this is all making sense. I was looking for something that is not there. So for now, we can't use Saxon/XPath 3.1 to handle JSON (no XML) datasources - correct? – David Thielen Jul 18 '20 at 18:31
  • And to @MartinHonnen note, once Saxon does read in a JSON file, that will give us an XdmArray or XdmMap for the root of the document. Will we also be able to get an XPathCompiler of that document to perform queries? TIA – David Thielen Jul 18 '20 at 18:32
  • @DavidThielen, you can query maps and arrays, they are functions and there is the `?` lookup operator, see https://www.w3.org/TR/xpath-31/#id-maps-and-arrays – Martin Honnen Jul 18 '20 at 19:51
  • @DavidThielen yes, you can handle a JSON data source. Just pass it into the XPath expression as a string and call parse-json() within the XPath expression. What you can't do is to parse the JSON into maps and arrays before invoking XPath. – Michael Kay Jul 19 '20 at 11:02
  • Michael - is the JSON expected to be in UTF-8 in this case or can the encoding be specified? We can convert if needed, but I don't want to if that is unnecessary. – David Thielen Jul 20 '20 at 12:15
  • parse-json() expects a string of characters, so any decoding must already have been done. json-doc() expects the resource to be in UTF-8, as mandated by the JSON specs. – Michael Kay Jul 20 '20 at 15:15
  • Ok I tried this and am not understanding something. I created a Java sample and put up a new question at https://stackoverflow.com/questions/63100636/how-do-i-load-a-json-file-into-the-dom-in-saxon-running-in-java – David Thielen Jul 26 '20 at 13:22