1

I have a directory of multiple XML files to be merged. The files are named in the order they should be merged, such as: file1.xml, file2.xml, file3.xml ..the number of files varies.

This code has worked in merging the files:

<xsl:for-each select="
    collection(iri-to-uri('/home/book/?select=*.html;recurse=yes'))">
    <xsl:apply-templates select="node()|@*"/>
</xsl:for-each>

Concern/Question: Has this merged the files in order by coincidence? Do I need to recode to enforce an ordered reading of the files? If so, any suggestions?

(using Saxon)

UPDATE: I believe Ian's reply to be correct, that is: no guarantee of ordering. I'm working on code like this (to be refactored and validated). I'm not sure this is a robust approach though.

<!-- load the directory file names into a variable -->
<xsl:variable name="file-names">
    <collection>
        <xsl:for-each select="collection('/home/book/?select=*.html')">
            <file>
                <xsl:value-of select="tokenize(document-uri(.), '/')[last()]"/>
            </file> 
        </xsl:for-each>
    </collection>        
</xsl:variable>

<!-- open the files in a sorted order -->
<xsl:template match="/">
    <xsl:for-each select="$file-names/collection/file">
        <xsl:sort select="replace(., '[^\d]', '')" data-type="number" />
        <xsl:variable name="filename" select="concat('/home/book/', . )"/>
        <xsl:copy-of select="doc($filename)"/>
    </xsl:for-each>
</xsl:template>
Paulb
  • 1,471
  • 2
  • 16
  • 39

2 Answers2

1

Looking at the source code for the default collection URI resolver it delegates to File.listFiles, which provides no guarantee of the ordering (in general, though it may be more consistent on some platforms than on others).

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • 1
    More specifically (a) the spec of collection() guarantees no order, (b) on Saxon-EE the files selected by File.listFiles() are parsed asynchronously in multiple threads, and the order of processing is the order in which parsing completes, which may be different from the order in which it starts. – Michael Kay Dec 08 '14 at 14:43
1

The <xsl:sort select="tokenize(document-uri(.), '/')[last()]"/> line works fine for a process I use that had a similar need (to sort the output transformed XML by filename input). Note that I've adapted Martin's line by closing the select with the quote mark " and by applying the self-closing tag.

ewh_in_MT
  • 21
  • 2