Merging pairs of nodes based on attribute, new to template matching

Question

Say I have the following XML:

<root>
   <tokens>
      <token ID="t1">blah</token>
      <token ID="t2">blabla</token>
      <token ID="t3">shovel</token>
   </tokens>

   <relatedStuff>
      <group gID="s1">
        <references tokID="t1"/>
        <references tokID="t2"/>
      </group>

      <group gID="s2">
        <references tokID="t3"/>
      </group>

   </relatedStuff>
</root>

Now, considering that a for-each loop for every token would be pretty inefficient and a bad idea, how would one go about using template matching, to transform this xml into the following?

<s id="everything_merged"> 
    <tok id="t1" gID="s1" >blah</tok> 
    <tok id="t2" gID="s1" >blabla</tok> 

    <tok id="t3" gID="s2" >shovel</tok>
</s>

All I want from <s> is the "gID", the gID corresponding to the token in the <tokens>.

<xsl:for-each select="b:root/a:tokens/a:token">
    <!-- and here some template matching -->
    <xsl:attribute name="gID">
         <xsl:value-of select="--correspondingNode's--@gID"/>
    </xsl:attribute>

</xsl:for-each>

I'm pretty fuzzy on this sort of thing, so thank you very much for any help!

Good question, +1. See my solution for a complete and short, pure "push-style" solution that also uses keys. — Dimitre Novatchev, May 05 '11 at 04:13

score 2 · Accepted Answer · answered May 04 '11 at 18:08

The following stylesheet:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="/">
        <s id="everything_merged">
            <xsl:apply-templates select="/root/tokens/token" />
        </s>
    </xsl:template>
    <xsl:template match="token">
        <tok id="{@ID}" gID="{/root/relatedStuff/group[
                                references[@tokID=current()/@ID]]/@gID}">
            <xsl:apply-templates />
        </tok>
    </xsl:template>
</xsl:stylesheet>

Applied to this input (corrected for well-formedness):

<root>
    <tokens>
        <token ID="t1">blah</token>
        <token ID="t2">blabla</token>
        <token ID="t3">shovel</token>
    </tokens>
    <relatedStuff>
        <group gID="s1">
            <references tokID="t1" />
            <references tokID="t2" />
        </group>
        <group gID="s2">
            <references tokID="t3" />
        </group>
    </relatedStuff>
</root>

Produces:

<s id="everything_merged">
    <tok id="t1" gID="s1">blah</tok>
    <tok id="t2" gID="s1">blabla</tok>
    <tok id="t3" gID="s2">shovel</tok>
</s>

Brilliant. I had no idea you could do this [] nesting `group[references[@tokID=current()/@ID]]`! Thanks a bunch! — Spectraljump, May 05 '11 at 12:09

score 1 · Answer 2 · answered May 05 '11 at 04:11

A solution using keys and pure "push-style:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kgIDfromTokId" match="@gID"
  use="../*/@tokID"/>

 <xsl:template match="tokens">
  <s id="everything_merged">
   <xsl:apply-templates/>
  </s>
 </xsl:template>

 <xsl:template match="token">
  <tok id="{@ID}" gID="{key('kgIDfromTokId', @ID)}">
   <xsl:apply-templates/>
  </tok>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root>
    <tokens>
        <token ID="t1">blah</token>
        <token ID="t2">blabla</token>
        <token ID="t3">shovel</token>
    </tokens>
    <relatedStuff>
        <group gID="s1">
            <references tokID="t1" />
            <references tokID="t2" />
        </group>
        <group gID="s2">
            <references tokID="t3" />
        </group>
    </relatedStuff>
</root>

the wanted, correct result is produced:

<s id="everything_merged">
   <tok id="t1" gID="s1">blah</tok>
   <tok id="t2" gID="s1">blabla</tok>
   <tok id="t3" gID="s2">shovel</tok>
</s>

Very useful answer, thanks. The more you know! This will come in useful, and it seems to be more elegant. Is there any performance difference between your approach and Lwburk's? (I should read up on keys and "push-style") — Spectraljump, May 05 '11 at 12:18
@Twodordan: Once the first `key()` function is executed the index is built and every next execution of the function comes almost for free. So, using keys is usually significantly more efficient when the `key()` function needs to be called at least twice. In this case it is called only once, and I have used it just for convenience. This solution could still be more efficient -- depends on the specific XSLT processor and one needs to perform measurements to find the exact performance difference and whether it is significant. — Dimitre Novatchev, May 05 '11 at 12:23

Merging pairs of nodes based on attribute, new to template matching

2 Answers2