Sort XML elements from several documents via Muenchian approach

Question

I have several XML documents. Each one of these documents has some elements with the same name (let's say ).

An example of two of these XML documents would be (simplifying):

input1.xml

<xml>
<body>
    <word>A1</word>
    <word>A2</word>
    <word>B1</word>
</body>
</xml>

and

input2.xml

<xml>
<body>
    <word>A2</word>
    <word>B1</word>
    <word>B2</word>
</body>
</xml>

I need (via XSLT 1.0) to sort all elements of the two files, avoiding repetitions.

The output file I need is:

output1.xml

<xml>
<body>
    <word>A1</word>
    <word>A2</word>
    <word>B1</word>
    <word>B2</word>
</body>
</xml>

I have tried to do it naming the input files as parameters in the XSLT file:

<xsl:param name="doc1">input1.xml</xsl:param>
<xsl:param name="doc2">input2.xml</xsl:param>

Then I created a element:

<xsl:key name="words" match="word" use="."/>

And I applied some templates to the elements of a combination of the two files, like this:

<xsl:apply-templates select="(document($doc1)|document($doc2))//body"/>

Finally, in the template I used the above created key for applying the Muenchian approach:

<xsl:template match="body">
    <xsl:for-each select="//hitza[generate-id() = generate-id(key('words',.)[1])]
        <xsl:sort select="."/>
        <xsl:value-of select="."/>
    </xsl:for-each>
</xsl>

This way I get a list of elements, but first I get all the elements of the input1.xml file, and then the elements of the input2.xml file:

<xml>
<body>
    <word>A1</word>
    <word>A2</word>
    <word>B1</word>
    <word>A2</word>
    <word>B1</word>
    <word>B2</word>
</body>
</xml>

I can't figure out how to get a list of non-repeated items from the two files.

Muenchian grouping is key based and keys work on a document, not a collection of documents. Can't you use XSLT 2.0 with Saxon or XmlPrime or Altova to make use of `xsl:for-each-group` or of `distinct-values`? — Martin Honnen, Aug 22 '16 at 11:44
Thank you for your quick answer. I'm not familiar with xslt 2.0, so I'm not sure how much work can it be to pass my code from 1.0 to 2.0, but I'll give it a try, knowing that the Muenchian approach won't work here. Thank you! — Josu Gomez, Aug 22 '16 at 11:48

score 1 · Answer 1 · edited May 23 '17 at 10:33

If you really need to to it with an XSLT 1.0 processor then you can use

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:param name="input1-uri" select="'file1.xml'"/>
    <xsl:param name="input1" select="document($input1-uri)"/>

    <xsl:param name="input2-uri" select="'file2.xml'"/>
    <xsl:param name="input2" select="document($input2-uri)"/>

    <xsl:output indent="yes"/>

    <xsl:template match="/">
        <xml>
            <body>
                <xsl:variable name="words" select="$input1//word | $input2//word"/>
                <xsl:for-each select="$words">
                    <xsl:sort select="."/>
                    <xsl:if test="generate-id() = generate-id($words[. = current()])">
                        <xsl:copy-of select="."/>
                    </xsl:if>
                </xsl:for-each>
            </body>
        </xml>
    </xsl:template>

</xsl:stylesheet>

based on the answer https://stackoverflow.com/a/18958901/252228.

With XSLT 3.0 and Saxon 9.7 (available from http://saxon.sourceforge.net/) you can use

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:param name="input1-uri" select="'input1.xml'"/>
    <xsl:param name="input1" select="document($input1-uri)"/>

    <xsl:param name="input2-uri" select="'input2.xml'"/>
    <xsl:param name="input2" select="document($input2-uri)"/>

    <xsl:output indent="yes"/>

    <xsl:template match="/" name="main">
        <xml>
            <body>
                <xsl:for-each select="sort(distinct-values($input1//word | $input2//word))">
                    <word>
                        <xsl:value-of select="."/>
                    </word>
                </xsl:for-each>
            </body>
        </xml>
    </xsl:template>

</xsl:stylesheet>

Thank you very much!! I'll try this as fast as I can. – Josu Gomez Aug 23 '16 at 14:26 — Josu Gomez, Aug 23 '16 at 14:26

score 0 · Answer 2 · answered Aug 22 '16 at 12:16

Here's how you can accomplish this task in XSLT 1.0:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:param name="doc2">input2.xml</xsl:param>

<xsl:key name="words" match="word" use="."/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>

<xsl:template match="body">
    <xsl:variable name="all-words">
        <xsl:copy-of select="word"/>
        <xsl:copy-of select="document($doc2)//body/word"/>
    </xsl:variable>
    <xsl:for-each select="exsl:node-set($all-words)/word[generate-id() = generate-id(key('words',.)[1])]">
        <xsl:sort select="."/>
            <xsl:copy-of select="."/>
    </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Note that this assumes you are processing input1.xml directly, and passing the path to input2.xml as a parameter.

Thank you very much!! I'll try this solution as soon as possible. — Josu Gomez, Aug 23 '16 at 14:26

Sort XML elements from several documents via Muenchian approach

2 Answers2