2

Given a list of xpath statements, I want to write a stylesheet that will run through an xml document and output the same document but with a comment inserted before the node identified in each xpath statement. Let's make up an example. Start with an xml instance holding the xpath statements:

<paths>
  <xpath location="/root/a" annotate="1"/>
  <xpath location="/root/a/b" annotate="2"/>
</paths>

Given the input:

<root>
  <a>
    <b>B</b>
  </a>
  <c>C</c>
</root>

It should produce:

<root>
  <!-- 1 -->
  <a>
    <!-- 2 -->
    <b>B</b>
  </a>
  <c>C</c>
</root>

My initial thought is to have an identity stylesheet which takes a file-list param, calls the document function on it to get the list of xpath nodes. It would then check each node of the input against that list and then insert the comment node when it finds one, but I expect that might be highly inefficient as the list of xpaths gets large (or maybe not, tell me. I'm using saxon 9).

So my question: Is there an efficient way to do something like this?

stand
  • 3,054
  • 1
  • 24
  • 27

3 Answers3

3

Assuming Saxon 9 PE or EE, it should also be possible to make use XSLT 3.0 and of xsl:evaluate as follows:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="xs math map mf"
    version="3.0">

    <xsl:output indent="yes"/>

    <xsl:param name="paths-url" as="xs:string" select="'paths1.xml'"/>
    <xsl:param name="paths-doc" as="document-node()" select="doc($paths-url)"/>

    <xsl:variable name="main-root" select="/"/>

    <xsl:variable 
        name="mapped-nodes">
        <map>
            <xsl:for-each select="$paths-doc/paths/xpath">
                <xsl:variable name="node" as="node()?" select="mf:evaluate(@location, $main-root)"/>
                <xsl:if test="$node">
                    <entry key="{generate-id($node)}">
                        <xsl:value-of select="@annotate"/>
                    </entry>
                </xsl:if>
            </xsl:for-each>
        </map>
    </xsl:variable>

    <xsl:key name="node-by-id" match="map/entry" use="@key"/>

    <xsl:function name="mf:evaluate" as="node()?">
        <xsl:param name="path" as="xs:string"/>
        <xsl:param name="context" as="node()"/>
        <xsl:evaluate xpath="$path" context-item="$context"></xsl:evaluate>
    </xsl:function>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="node()[key('node-by-id', generate-id(), $mapped-nodes)]">
        <xsl:comment select="key('node-by-id', generate-id(), $mapped-nodes)"/>
        <xsl:text>&#10;</xsl:text>
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>


</xsl:stylesheet>

Here is an edited version of the originally posted code that uses the XSLT 3.0 map feature instead of a temporary document to store the association between the generated id of a node found by dynamic XPath evaluation and the annotation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:map="http://www.w3.org/2005/xpath-functions/map"
    xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="xs math map mf"
    version="3.0">

    <xsl:param name="paths-url" as="xs:string" select="'paths1.xml'"/>
    <xsl:param name="paths-doc" as="document-node()" select="doc($paths-url)"/>

    <xsl:output indent="yes"/>

    <xsl:variable 
        name="mapped-nodes"
        as="map(xs:string, xs:string)"
        select="map:new(for $path in $paths-doc/paths/xpath, $node in mf:evaluate($path/@location, /) return map:entry(generate-id($node), string($path/@annotate)))"/>

    <xsl:function name="mf:evaluate" as="node()?">
        <xsl:param name="path" as="xs:string"/>
        <xsl:param name="context" as="node()"/>
        <xsl:evaluate xpath="$path" context-item="$context"></xsl:evaluate>
    </xsl:function>

    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="node()[map:contains($mapped-nodes, generate-id())]">
        <xsl:comment select="$mapped-nodes(generate-id())"/>
        <xsl:text>&#10;</xsl:text>
        <xsl:copy>
            <xsl:apply-templates select="@* , node()"/>
        </xsl:copy>
    </xsl:template>


</xsl:stylesheet>

As the first stylesheet, it needs Saxon 9.5 PE or EE to be run.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • I like how by leveraging XSLT 3.0's XPath evaluation facilities, your answer appears to be able to provide even more robust matching than template/@match. +1 – kjhughes Oct 05 '13 at 15:15
2

Overview:

Write a meta XSLT transformation that takes the paths file as input and produces a new XSLT transformation as output. This new XSLT will transform from your root input XML to the annotated copy output XML.

Notes:

  1. Works with XSLT 1.0, 2.0, or 3.0.
  2. Should be very efficient, especially if the generated transformation has to be run over a large input or has to be run repeatedly, because it effectively compiles into native XSLT rather than reimplementing matching with an XSLT-based interpreter.
  3. Is more robust than approaches that have to rebuild element ancestry manually in code. Since it maps the paths to template/@match attributes, the full sophistication of @matching is available efficiently. I've included an attribute value test as an example.
  4. Be sure to consider elegant XSLT 2.0 and 3.0 solutions by @DanielHaley and @MartinHonnen, especially if an intermediate meta XSLT file won't work for you. By leveraging XSLT 3.0's XPath evaluation facilities, @MartinHonnen's answer appears to be able to provide even more robust matching than template/@match does here.

This input XML that specifies XPaths and annotations:

<paths>
  <xpath location="/root/a" annotate="1"/>
  <xpath location="/root/a/b" annotate="2"/>
  <xpath location="/root/c[@x='123']" annotate="3"/>
</paths>

When input to this meta XSLT transformation:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="/paths">
    <xsl:element name="xsl:stylesheet">
      <xsl:attribute name="version">1.0</xsl:attribute>
      <xsl:element name="xsl:output">
        <xsl:attribute name="method">xml</xsl:attribute>
        <xsl:attribute name="indent">yes</xsl:attribute>
      </xsl:element>
      <xsl:call-template name="gen_identity_template"/>
      <xsl:apply-templates select="xpath"/>
    </xsl:element>
  </xsl:template>

  <xsl:template name="gen_identity_template">
    <xsl:element name="xsl:template">
      <xsl:attribute name="match">node()|@*</xsl:attribute>
      <xsl:element name="xsl:copy">
        <xsl:element name="xsl:apply-templates">
          <xsl:attribute name="select">node()|@*</xsl:attribute>
        </xsl:element>
      </xsl:element>
    </xsl:element>
  </xsl:template>

  <xsl:template match="xpath">
    <xsl:element name="xsl:template">
      <xsl:attribute name="match">
        <xsl:value-of select="@location"/>
      </xsl:attribute>
      <xsl:element name="xsl:comment">
        <xsl:value-of select="@annotate"/>
      </xsl:element>
      <xsl:element name="xsl:text">
        <xsl:text disable-output-escaping="yes">&amp;#xa;</xsl:text>
      </xsl:element>
      <xsl:element name="xsl:copy">
        <xsl:element name="xsl:apply-templates">
          <xsl:attribute name="select">node()|@*</xsl:attribute>
        </xsl:element>
      </xsl:element>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

Will produce this XSLT transformation:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
   <xsl:output method="xml" indent="yes"/>
   <xsl:template match="node()|@*">
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
   <xsl:template match="/root/a">
      <xsl:comment>1</xsl:comment>
      <xsl:text>&#xa;</xsl:text>
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
   <xsl:template match="/root/a/b">
      <xsl:comment>2</xsl:comment>
      <xsl:text>&#xa;</xsl:text>
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
   <xsl:template match="/root/c[@x='123']">
      <xsl:comment>3</xsl:comment>
      <xsl:text>&#xa;</xsl:text>
      <xsl:copy>
         <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

Which, when provided this input XML file:

<root>
  <a>
    <b>B</b>
  </a>
  <c x="123">C</c>
</root>

Will produce the desired output XML file:

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <!--1-->
   <a>
    <!--2-->
      <b>B</b>
  </a>
  <!--3-->
   <c x="123">C</c>
</root>
kjhughes
  • 106,133
  • 27
  • 181
  • 240
  • Good catch, @MartinHonnen. I neglected to update that input XML when I edited my answer to show an example of more robust matching. Fixed now. Thank you. – kjhughes Oct 05 '13 at 14:30
  • I think this is probably the approach I'll take since I'm not at all familiar with 3.0 at this point. I've made some attempts to use the `saxon:evaluate` function which I suspect is about the same as Martin's solution. Regarding this solution, I haven't been able to figure out a way to annotate xpaths that are attribute nodes. e.g. `"/root/c/@x"`. – stand Oct 07 '13 at 23:49
2

I'm not sure if kjhughes' suggestion of creating a second transform would be more efficient than your original idea or not. I do see the possibility of that second transform becoming huge if your paths XML gets large.

Here's how I'd do it...

XML Input

<root>
    <a>
        <b>B</b>
    </a>
    <c>C</c>
</root>

"paths" XML (paths.xml)

<paths>
    <xpath location="/root/a" annotate="1"/>
    <xpath location="/root/a/b" annotate="2"/>
</paths>

XSLT 2.0

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output indent="yes"/>
    <xsl:strip-space elements="*"/>

    <xsl:param name="paths" select="document('paths.xml')"/>

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="*" priority="1">
        <xsl:variable name="path">
            <xsl:for-each select="ancestor-or-self::*">
                <xsl:value-of select="concat('/',local-name())"/>
            </xsl:for-each>
        </xsl:variable>
        <xsl:if test="$paths/*/xpath[@location=$path]">
            <xsl:comment select="$paths/*/xpath[@location=$path]/@annotate"/>
        </xsl:if>
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

XML Output

<root>
    <!--1-->
    <a>
        <!--2-->
        <b>B</b>
    </a>
    <c>C</c>
</root>
Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • This is a very different but nice solution too. +1 – kjhughes Oct 05 '13 at 15:13
  • Thanks, @Daniel, this is more or less what I came up with. I guess the follow-up question I have relates to the performance of the `test` predicate in the `xsl:if` element. Is it possible for it to be anything other than `O(n)` where `n` is the number of paths? I don't have a good handle on that. – stand Oct 07 '13 at 01:48
  • Another thing I thought of about this solution. I believe it would fail if your `paths.xml` had weird xpaths. For instance if I replaced `"/root/a/b"` with `/root/*[local-name()='a']/b`. The two should return the same node sets but since the `path` variable is constructed as the string `/root/a/b` it would not match the second one. This is where an `evaluate` function would be the only way out. – stand Oct 07 '13 at 23:16