0

I have two XML files and desire a merger, the criterion for this merger is as follows:

nodes1.xml file content:

<nodes>
  <node>
    <type>a</type>
    <name>joe</name>
  </node>
  <node>
    <type>b</type>
    <name>sam</name>
  </node>
  <node>
    <type>c</type>
    <name>pez</name>
  </node>
  <node>
    <type>g</type>
    <name>lua</name>
  </node>
  <node>
    <type>a</type>
    <name>tol</name>
  </node>
  <node>
    <type>c</type>
    <name>jua</name>
  </node>
</nodes>

nodes2.xml file content:

<nodes>
  <node>
    <type>a</type>
    <name>jill</name>
  </node>
  <node>
    <type>c</type>
    <name>imol</name>
  </node>
  <node>
    <type>h</type>
    <name>teli</name>
  </node>
  <node>
    <type>f</type>
    <name>jopp</name>
  </node>
  <node>
    <type>c</type>
    <name>zolh</name>
  </node>
</nodes>

and by my xsl template I get:

<?xml version="1.0" encoding="UTF-8"?>
<nodes>
  <node tipo="a">
    <name>joe</name>
    <name>tol</name>
    <name>jill</name>
  </node>
  <node tipo="c">
    <name>pez</name>
    <name>jua</name>
    <name>imol</name>
    <name>zolh</name>
  </node>
  <node tipo="h">
    <name>teli</name>
  </node>
  <node tipo="f">
    <name>jopp</name>
  </node>
</nodes>

I need a solution to get better performance. My current solution is:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>
  <xsl:variable name="Source2" select="document('nodes2.xml')/nodes/node"/>
  <xsl:variable name="Source1" select="document('nodes1.xml')/nodes/node"/>
  <xsl:template match="/nodes" >
    <nodes>
      <xsl:for-each-group select="node" group-by="type">
        <node tipo="{type}">
          <xsl:apply-templates select="$Source1[type=current-grouping-key()]/name"/>
          <xsl:apply-templates select="$Source2[type=current-grouping-key()]/name"/>
        </node>
      </xsl:for-each-group>
    </nodes>
  </xsl:template>

  <xsl:template match="name">
    <name><xsl:value-of select="."/></name>
  </xsl:template>
</xsl:stylesheet>

I run it with java saxon:

$ java net.sf.saxon.Transform nodes2.xml mysolution.xsl

I think "a shame" to have the input file at the same time in a variable, but I can not figure out to do it differently.

I appreciate help or pointer.

--Paulino

  • Why is it "a shame"? Are you encountering performance issues? Is the input XML the same as either nodes1.xml or nodes2.xml? If so, you only need one variable instead of two. – JLRishe Apr 25 '13 at 19:24
  • No. I'm just looking for a solution less "procedural" and maybe more "functional" and in any case avoid loading the same input file into a variable I suppose there is any way to get better performance and less memory usage. Note that the input file is only used to group (in order to know the different keys) ... I do not like – Paulino Huerta Apr 25 '13 at 21:43
  • Well, as I said, if the input file is the same as one of those two files, you can remove one of the variables and replace one of the `xsl:apply-templates` with ``. I doubt you're going to get it any less procedural than it is. – JLRishe Apr 25 '13 at 22:26

1 Answers1

0

Assuming you have the second of the files as the primary input to the XSLT code you can use the following:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:param name="source1-uri" select="'nodes1.xml'"/>
  <xsl:variable name="doc1" select="doc($source1-uri)"/>

  <xsl:key name="by-type" match="nodes/node" use="type"/>

  <xsl:template match="/nodes" >
    <nodes>
      <xsl:for-each-group select="key('by-type', node/type, $doc1), node" group-by="type">
        <node tipo="{current-grouping-key()}">
          <xsl:copy-of select="for $n in current-group() return $n/name"/>
        </node>
      </xsl:for-each-group>
    </nodes>
  </xsl:template>

</xsl:stylesheet>

I am not sure whether the order of the merged name elements matters to you but to ensure with Saxon 9.5 that I get the order you posted in your result sample I had to use <xsl:copy-of select="for $n in current-group() return $n/name"/> instead of the shorter and more usual <xsl:copy-of select="current-group()/name"/>.

So that solution should be more efficient, mainly by grouping on all input nodes and of course by then simply making use of current-group() instead of select the nodes again with a predicate.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110