1

Our XML data is stored in separate files, so the personnel can work individually on simple modules. The separate files are assembled to one master file to be processed further. Currently I am doing this within the IDE of the Oxygen XML Editor. To streamline the process, I would like to do it from command line without this IDE. How can I resolve the the XInclude statements from command line with Saxon HE (if this is possible)?

I tried a command like this:

java -jar saxon9he.jar -xi:on -s:main.xml -xsl:assemble.xslt -o:master.xml -t

and get the following error code:

Saxon-HE 9.9.1.4J from Saxonica
Java version 1.8.0_191
Stylesheet compilation time: 361.152836ms
Processing file:/u:/Wolke/xml/resolve-xi/main.xml
Using parser com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser
Building tree for file:/u:/Wolke/xml/resolve-xi/main.xml using class net.sf.saxon.tree.tiny.TinyBuilder
Exception in thread "main" java.lang.StackOverflowError
        at java.security.AccessController.doPrivileged(Native Method)
        at com.sun.org.apache.xerces.internal.utils.SecuritySupport.getContextClassLoader(Unknown Source)
        at com.sun.org.apache.xerces.internal.utils.ObjectFactory.findClassLoader(Unknown Source)
        at com.sun.org.apache.xerces.internal.utils.ObjectFactory.newInstance(Unknown Source)
        at com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.handleIncludeElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.xinclude.XIncludeHandler.emptyElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.scanStartElement(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(Unknown Source)
        at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
[and many more lines]

Saxonica's documentation on the xi:on parameter says: "Apply XInclude processing to all input XML documents (including schema and stylesheet modules as well as source documents). This currently only works when documents are parsed using the Xerces parser, which is the default in JDK 1.5 and later." (https://www.saxonica.com/documentation9.5/using-xsl/commandline.html) -- not sure, what this means.

Main XML file:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader xml:id="header">
        <fileDesc>
            <titleStmt><title>Trying to make XInclude work</title></titleStmt>
            <publicationStmt><p>Sample data for stackoverflow question</p></publicationStmt>
            <sourceDesc><p>Just made up</p></sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file1.xml" xpointer="content-p1"/>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file2.xml" xpointer="content-p2"/>
            <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="file3.xml" xpointer="content-p3"/>
        </body>
    </text>
</TEI>

XML component files:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
    <text>
        <body>
            <div type="page" xml:id="content-p1">
                <p> Integer sit amet justo porta nisl porta aliquet in a justo.</p>
            </div>
        </body>
    </text>
</TEI>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
   <text>
      <body>
         <div type="page" xml:id="content-p2">
            <p>Quisque gravida venenatis varius.</p>
         </div>
      </body>
   </text>
</TEI>
<?xml version="1.0" encoding="UTF-8"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?>
<?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml"
    schematypens="http://purl.oclc.org/dsdl/schematron"?>
<?xml-stylesheet type="text/css" href="../css/mm-xml.css"?>

<TEI xmlns="http://www.tei-c.org/ns/1.0">
   <xi:include xmlns:xi="http://www.w3.org/2001/XInclude" href="main.xml" xpointer="header"/>
   <text>
      <body>
         <div type="page" xml:id="content-p3">
            <p>Nullam nisi lacus, malesuada vel eros porta, dictum finibus mauris.</p>
         </div>
      </body>
   </text>
</TEI>

XSLT:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

The output I would need (like the Oxygen IDE creates it):

<?xml version="1.0" encoding="UTF-8"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://relaxng.org/ns/structure/1.0"?><?xml-model href="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="application/xml" schematypens="http://purl.oclc.org/dsdl/schematron"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <teiHeader xml:id="header">
        <fileDesc>
            <titleStmt>
                <title>Trying to make XInclude work</title>
            </titleStmt>
            <publicationStmt>
                <p>Sample data for stackoverflow question</p>
            </publicationStmt>
            <sourceDesc>
                <p>Just made up</p>
            </sourceDesc>
        </fileDesc>
    </teiHeader>
    <text>
        <body>
            <div type="page" xml:id="content-p1" xml:base="file1.xml">
                <p> Integer sit amet justo porta nisl porta aliquet in a justo.</p>
            </div>
            <div type="page" xml:id="content-p2" xml:base="file2.xml">
                <p>Quisque gravida venenatis varius.</p>
            </div>
            <div type="page" xml:id="content-p3" xml:base="file3.xml">
                <p>Nullam nisi lacus, malesuada vel eros porta, dictum finibus mauris.</p>
            </div>
        </body>
    </text>
</TEI>
  • Please run Saxon with the `-t` option and add the exact version output of Saxon and the Java JRE to your question description. – Martin Honnen Aug 02 '19 at 11:30
  • 1
    I can't explain the stacktrace you have shown but in a local test with Saxon 9.9.1.4 HE and JRE 1.8 it seems that Xerces as the XML parser supports the XInclude directive but not any `xml:id` based `xpointer` reference. This seems in line with https://xerces.apache.org/xerces2-j/faq-xinclude.html#faq-8. I am not sure how oXygen achieves that they do work, it might help asking their support if they have a special version of Xerces or some particular configuration that makes the `xml:id` based `xpointer` references work. – Martin Honnen Aug 02 '19 at 12:23
  • Thanks for the feedback, Martin Honnen, I added the -t parameter and edited the question description accordingly. – Martin Hinze Aug 02 '19 at 12:40
  • I didn't expect Xerces does not support the xpointer() function, thanks for the tip, I will ask the support. – Martin Hinze Aug 02 '19 at 12:50
  • It seems Xerces supports `xpointer="some-id"` if the document you are including has a DTD based declaration of an element with a particular attribute declared of type `ID` and the value `some-id` but somehow the `xml:id` based declaration is not supported. – Martin Honnen Aug 02 '19 at 15:19
  • The StackOverflow suggests to me that you have a circular XInclude reference here - main.xsl is including file1.xml and file1.xml is including main.xsl. That clearly can't work. – Michael Kay Aug 02 '19 at 16:37
  • Actually it is working fine (at least inside the oXygen IDE): main.xml includes only the content elements from the source files. The source files only contain the header of the main file, so there is no circular reference. – Martin Hinze Aug 02 '19 at 16:50
  • It seems http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng, the schema referenced in your TEI documents, declares the `xml:id` attribute. Not sure whether that is what helps or enables oXygen to support/enable/configure the `xpointer` to a referenced XML with an `xml:id` attribute to work. No idea either whether there is a way to set up Saxon with an XML parser taking RNG schemas and IDs into account for XInclude. Any news from the oXygen guys how they make that work? – Martin Honnen Aug 02 '19 at 18:12
  • So far no answer from them, but it's not super urgent. oXygen is not only for TEI XML, so I wouldn't expect a connection here, but I don't know. – Martin Hinze Aug 02 '19 at 18:36
  • It seems indeed not to depend on the use of TEI, an XInclude with an `xpointer` to an `xml:id` attribute in a referenced document works with Saxon 9.8 HE inside oXygen but fails (although for me with a warning that the XPointer reference failed) with Saxon 9.9 or 9.8 HE outside of oXygen. – Martin Honnen Aug 02 '19 at 19:31
  • This is oXygen's documentation about the usage of XInclude and xpointer: https://www.oxygenxml.com/doc/versions/21.1/ug-editor/topics/including-document-parts-with-XInclude.html – Martin Hinze Aug 03 '19 at 11:25
  • Have the oXygen guys been able to explain whether they use a specialized parser or a certain setting/configuration to have XInclude/xpointer support based on `xml:id` while Saxon's normal command line use of Xerces does not provide that? – Martin Honnen Aug 08 '19 at 12:34
  • Octavian Nadolu from oXygen kindly wrote me two e-mails. oXygen is using a patched version of Xerces (https://issues.apache.org/jira/browse/XERCESJ-1113, https://mvnrepository.com/artifact/com.oxygenxml/oxygen-patched-xerces/21.1.0.2). The command should look like this: java -cp "patched-xerces.jar;saxon.jar" net.sf.saxon.Transform -xi:on -s:main.xml -xsl:assemble.xslt Unfortunately I get the error "Hauptklasse net.sf.saxon.Transfom konnte nicht gefunden oder geladen werden" (like "main class net.sf.saxon.Transfom could not be found"). – Martin Hinze Aug 08 '19 at 12:43
  • Well, the name of Saxon 9 HE's jar file is usually `saxon9he.jar` and not `saxon.jar` so it seems you might simply not have used the right `-cp` argument if you want to use Saxon 9 HE from the command line.. – Martin Honnen Aug 11 '19 at 10:28

1 Answers1

3

Based on our exchange of comments and the advice you got from the oXygen support it looks like using oXygen's patched version of Xerces (available at https://mvnrepository.com/artifact/com.oxygenxml/oxygen-patched-xerces/21.1.0.2) together with Saxon 9.9 HE should work to enable xpointer based XInclude from xml:id attributes:

java -cp 'oxygen-patched-xerces-21.1.0.2.jar;saxon9he.jar' net.sf.saxon.Transform -t -s:input.xml -xsl:sheet.xsl -xi:on

This is the command line I have used and tested in a Windows 10 Powershell window, depending on the platform and command line shell you might need different quote characters for the -cp argument and a different item separator between differents jar files listed there.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Just realized that Bash Shell uses a colon as path seperator. So with Bash the command would be like: `java -cp 'oxygen-patched-xerces-21.1.0.2.jar:saxon9he.jar' net.sf.saxon.Transform -t -s:input.xml -xsl:sheet.xsl -xi:on` – Martin Hinze Aug 21 '19 at 12:49