0

In my Java code I am trying to create a Saxon document (DOM) that is the contents of a JSON file. This should be possible but the code I have fails.

The full code for this is at SaxonQuestions.zip, TestLoadJson.java and is also listed below. In this code the evaluate() fails.

TestLoadJson.java

import net.sf.saxon.Configuration;
import net.sf.saxon.s9api.*;
import org.xml.sax.InputSource;
import org.xml.sax.XMLReader;

import javax.xml.transform.sax.SAXSource;
import java.io.*;
import java.nio.charset.Charset;

public class TestLoadJson {
    public static void main(String[] args) throws Exception {

        // get the file
        File jsonFile = new File("files", "SouthWind.json");
        Charset inputCharset = Charset.forName("UTF-8");
        FileInputStream fis = new FileInputStream(jsonFile);
        InputStreamReader isr = new InputStreamReader(fis, inputCharset);
        BufferedReader br = new BufferedReader(isr);

        String str;
        StringBuilder buf = new StringBuilder();
        while ((str = br.readLine()) != null)
            buf.append(str).append('\n');

        br.close();
        isr.close();
        fis.close();

        // set up the compiler
        Configuration config = XmlDatasource.createEnterpriseConfiguration();
        Processor processor = new Processor(config);
        XPathCompiler xPathCompiler = processor.newXPathCompiler();

        // need an XML document
        DocumentBuilder doc_builder = processor.newDocumentBuilder();

        XMLReader reader = XmlDatasource.createXMLReader();

        InputSource xmlSource = new InputSource(new ByteArrayInputStream("<root/>".getBytes()));
        SAXSource saxSource = new SAXSource(reader, xmlSource);
        XdmNode xmlRootNode = doc_builder.build(saxSource);


        // give it the JSON
        buf.insert(0, "parse-json(");
        buf.append(")");
        Object json = xPathCompiler.evaluate(buf.toString(), xmlRootNode);

        System.out.println("JSON read in!!! json = " + json);
    }
}
David Thielen
  • 28,723
  • 34
  • 119
  • 193
  • Either pass in the file URI as a variable to XPath and use `json-doc($var)` or pass in your JSON as a string variable and use `parse-json($var)`. You don't really want to use string concatenation. If you wanted to you would need to wrap the JSON into an XPath string literal, but it would break with any JSON containing single quotes. – Martin Honnen Jul 26 '20 at 20:31
  • And note that it is not a DOM, in the XDM JSON objects are represented as `XdmMap`s and JSON arrays as `XdmArray`s, JSON numbers as `xs:double`s, JSON strings as `xs:string`s. – Martin Honnen Jul 26 '20 at 20:34

1 Answers1

0

If you have a Java String with JSON pass it in as a variable to XPath and call parse-json on the variable:

    Processor processor = new Processor(true);
    
    String[] jsonExamples = { "1", "true", "null", "\"string\"", "[1,2,3]", "{ \"prop\" : \"value\" }" };
    
    XPathCompiler compiler = processor.newXPathCompiler();
    
    compiler.declareVariable(new QName("json"));
    
    XPathExecutable executable = compiler.compile("parse-json($json)");
    
    XPathSelector selector = executable.load();
    
    for (String json : jsonExamples) {
        selector.setVariable(new QName("json"), new XdmAtomicValue(json));
        XdmValue value = selector.evaluate();
        System.out.println(value);
    }

If you have a file with JSON pass its file name or in general URI as a variable to XPath and call json-doc (https://www.w3.org/TR/xpath-functions/#func-json-doc) on the variable:

    compiler = processor.newXPathCompiler();
    
    compiler.declareVariable(new QName("json-uri"));
    
    executable = compiler.compile("json-doc($json-uri)");
    
    selector = executable.load();
    
    selector.setVariable(new QName("json-uri"), new XdmAtomicValue("example1.json")); // pass in a relative (e.g. 'example.json' or 'subdir/example.json') or an absolute URI (e.g. 'file:///C:/dir/subdir/example.json' or 'http://example.com/example.json') here, not an OS specific file path
    
    XdmValue value = selector.evaluate();
    
    System.out.println(value);

Of course you can separate the steps and parse a string to an XdmValue or a file to an XdmValue and then pass it in later as a variable to another XPath evaluation.

So lets assume you have employees.json containing

{ 
    "employees": [ 
        { 
          "name": "mike",
          "department": "accounting",
          "age": 34 
        },
        { 
          "name": "sally",
          "department": "sales",
          "age": 24
        }
      ]
}

then you can parse it with the second sample into an XdmValue value and use that further as a context item for an expression e.g

avg(?employees?*?age)

would compute the average age:

        Processor processor = new Processor(true);

        XPathCompiler compiler = processor.newXPathCompiler();

        compiler.declareVariable(new QName("json-uri"));

        XPathExecutable executable = compiler.compile("json-doc($json-uri)");

        XPathSelector selector = executable.load();

        selector.setVariable(new QName("json-uri"), new XdmAtomicValue("employees.json"));

        XdmValue value = selector.evaluate();

        System.out.println(value);

        executable = compiler.compile("avg(?employees?*?age)");

        selector = executable.load();

        selector.setContextItem((XdmItem) value);
        
        XdmItem result = selector.evaluateSingle();

        System.out.println(result);

At https://xqueryfiddle.liberty-development.net/94hwphZ I have another sample processing JSON, it also computes the average of a value with an expression using the lookup operator ?, first with ?Students to select the Students item of the context map, then with an asterisk ?* on the returned array to get a sequence of all array items, finally with ?Grade to select the Grade value of each array item:

avg(?Students?*!(?Grade, 70)[1])

but with the additional requirement to select a default of 70 for those objects/maps that don't have a Grade. The sample JSON is

{
  "Class Name": "Science",
  "Teacher\u0027s Name": "Jane",
  "Semester": "2019-01-01",
  "Students": [
    {
      "Name": "John",
      "Grade": 94.3
    },
    {
      "Name": "James",
      "Grade": 81.0
    },
    {
      "Name": "Julia",
      "Grade": 91.9
    },
    {
      "Name": "Jessica",
      "Grade": 72.4
    },
    {
      "Name": "Johnathan"
    }
  ],
  "Final": true
}

The fiddle supports XQuery 3.1 but like for XPath 3.1 the JSON is passed in as a variable and then parsed with parse-json into an XDM item to serve as the context item for further evaluation.

To give some examples of more complex XPath 3.1 expressions against JSON I have taken the JSON sample from the path examples in https://github.com/json-path/JsonPath as the JSON input to parse-json (if you have a string) or json-doc if you have a URI to a file or even a HTTP(S) location and used it as the context item for some paths (evaluated in the fiddle as XQuery 3.1 but XPath 3.1 is a subset and I think I have restricted the samples to XPath 3.1:

The samples are at:

The file is

{
    "store": {
        "book": [
            {
                "category": "reference",
                "author": "Nigel Rees",
                "title": "Sayings of the Century",
                "price": 8.95
            },
            {
                "category": "fiction",
                "author": "Evelyn Waugh",
                "title": "Sword of Honour",
                "price": 12.99
            },
            {
                "category": "fiction",
                "author": "Herman Melville",
                "title": "Moby Dick",
                "isbn": "0-553-21311-3",
                "price": 8.99
            },
            {
                "category": "fiction",
                "author": "J. R. R. Tolkien",
                "title": "The Lord of the Rings",
                "isbn": "0-395-19395-8",
                "price": 22.99
            }
        ],
        "bicycle": {
            "color": "red",
            "price": 19.95
        }
    },
    "expensive": 10
}
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • So I tried the pass a filename ("C:\\src\TestSaxon\\files\\SouthWind.json") and got Exception in thread "main" net.sf.saxon.s9api.SaxonApiException: Illegal character in opaque part: C:\src\TestSaxon\files\SouthWind.json – David Thielen Aug 08 '20 at 16:25
  • For the pass in as a string, if I have a large complex JSON, is there a way to pass the entire string in rather than passing it in element by element? – David Thielen Aug 08 '20 at 16:25
  • Once I have it loaded, how do I then run an XPath query (like "/windward-studios/Employees/Employee[@EmployeeID < $p1]") against it? – David Thielen Aug 08 '20 at 16:27
  • @DavidThielen, as hinted at in previous answers and comments on that topic, XPath 3.1 allows you to write expressions against "JSON" represented as map and/or arrays but these are not XPath path expressions with steps separated by `/` as you use to traverse an XML structure, instead there are new expressions based on the lookup operator `?` (https://www.w3.org/TR/xpath-31/#id-lookup) and/or considering maps and arrays functions. Additionally there is the XPath 3.1 function library extended to have functions in a namespace for maps and one in the namespace for arrays. – Martin Honnen Aug 08 '20 at 16:43
  • The XPath 3.1 spec has all the details but I think the Altova guys have produced a version of that at https://www.altova.com/training/xpath3/xpath-31#lookup-operator which is better usable as a tutorial on how to use XPath 3.1. See the sections https://www.altova.com/training/xpath3/xpath-31#maps and https://www.altova.com/training/xpath3/xpath-31#arrays. – Martin Honnen Aug 08 '20 at 16:51
  • If you want to write "XPath steps" against JSON then you would need to use `json-to-xml` on the JSON to work with its XML representation. That representation is working against any JSON and round-trips but is not necessarily the most straightforward representation for selection as e.g. `{ "foo" : "bar" }` is represented as `bar` so you need to select `/map/string[@key = 'foo']` and not `/map/foo`. – Martin Honnen Aug 08 '20 at 16:52
  • For the error you get it might be better that you put that in a separate question with the necessary minimal but complete details to reproduce it. – Martin Honnen Aug 08 '20 at 16:55
  • As for having a string with "large" JSON, the first example declaring a variable named `json` and using `parse-json($json)` on the XPath side should be able to handle any size of JSON fitting into a string, I just put in 6 simple, different examples that should show how the different JSON types are mapped to XDM types (number -> `xs:double`, string -> `xs:string`, boolean -> `xs:boolean`, null -> empty sequence, object -> map, array -> array). – Martin Honnen Aug 08 '20 at 16:59
  • I have a very basic question here - that may kill my wanting to use this. Can I get the full power of XPath queries against JSON? Maybe different syntax but can I somehow get the equivalent of "/windward-studios/Employees/Employee[@EmployeeID < $p1]"? Or is it more basic where I can walk to an element but that's about it? – David Thielen Aug 09 '20 at 19:23
  • The parse-json example you show has you walking the JSON elements in the raw JSON to pass in as atomic variables. If I have a 512M JSON file, that's a ton of walking the source. – David Thielen Aug 09 '20 at 19:25
  • @DavidThielen, are you talking about `String[] jsonExamples = { "1", "true", "null", "\"string\"", "[1,2,3]", "{ \"prop\" : \"value\" }" };`? As I said, I used that to demonstrate parsing the different types of JSON that exist in the same sample code. It is not supposed to tell you that you need to break up your complex JSON, it just serves as a single example parsing the different data types JSON has. So throw in any more complex JSON object or array, the Java code will be the same (you will have just one string to parse, not 6). – Martin Honnen Aug 09 '20 at 19:31
  • @DavidThielen, as for the full power of XPath, sometimes it seems you don't take the time to read and understand answers and resources linked to in answers or comments. The XPath 3.1 way to represent JSON objects is XDM maps and the XPath 3.1 way to represent JSON arrays is XDM arrays, the way you query them is shown in the links and in the samples. And I said that classic XPath steps separated by `/` don't apply to maps or arrays, that is for nodes. – Martin Honnen Aug 09 '20 at 19:34
  • Yes I understand that. But every example I see for arrays or maps is to return the entire array or map from a specific xpath statement. I haven't seen any examples that allow for conditionals on what parts to return from the array/map. Does that exist? – David Thielen Aug 09 '20 at 20:13
  • Let me go into more detail on what I am struggling with here. And all of this is using a small simple JSON file (basically the Northwind database as JSON). Our customers have JSON that is a lot more complex. First, is how do I pass this in to Saxon. Generally we get the JSON as a stream so we need to read it in as a string and pass it in - but we don't know the structure and so the example of walking elements to pass in XdmAtomicValue objects I don't think will work. – David Thielen Aug 09 '20 at 20:25
  • Second, we generally need to query where objects are nested in the JSON and we need to get all grandchildren of a parent object where a property in the grand child matches a comparison test. And to then get back the object where different properties can individually be pulled back. So no / in the query, but then how do we get to that grandchild and then have a conditional? – David Thielen Aug 09 '20 at 20:26
  • You can select e.g. `?foo` to select only the `foo` item from a map, if it is a "complex" value like another map or an array you select that map or array. The same hold for arrays e.g `?2` selects the second item from an array and if that is a map or an array you have selected a "complex" part of an array, just as you would select a child element in XML. As for predicates, yes, they do continue to exist. I think it is better you post a separate question with some JSON sample and explain exactly what you consider a parent/child/grandchild in that context. – Martin Honnen Aug 09 '20 at 20:30
  • Yes, once I figure out how to load my Southwind.json file and can start running queries against it, then I think I'll understand this better. Thanks – David Thielen Aug 09 '20 at 22:53
  • Hi; To get back to the original question (my fault the comments got off track), the parse-json example you show has you walking the JSON elements in the raw JSON to pass in as atomic variables. Is there a way to pass as a single variable the entire JSON string which is a very complexe set of nested JSON objects? – David Thielen Aug 10 '20 at 15:31
  • @DavidThielen, sorry, I don't know what to show and explain more, I showed 6 different JSON samples in one example simply to show the six different types of JSON are all handled by `parse-json`. And it doesn't matter to `parse-json` whether the string you pass in is a simple JSON object or array, it takes a string with any JSON to parse it. After parsing the JSON string on the XPath 3.1 side the `parse-json` will give you one of the types `xs:double`, 'xs:boolean`, `xs:string`, empty sequence or `map(*)` or `array(*)`. – Martin Honnen Aug 10 '20 at 15:41