2

Is there a way to make xsl:result-document overwrite or skip files when output files have duplicate URI ? I think I don't have to provide an example. I have database with duplicate entries in it. I know I can put an id and then remove id from names of 60000 files

Best regards.

RyosanCiffer
  • 115
  • 2
  • 12
  • You can't write to the same URI twice in the same transformation. If you want to identify duplicates then of course in XSLT 2 or later you can use `xsl:for-each-group select="//record" group-by="fname"`, only you seem to be also working on some streaming solution for your large input so it might be more complicated. You will need to post some details to get more concrete help. – Martin Honnen Aug 15 '17 at 08:45

2 Answers2

2

It seems with XSLT 3.0 you can catch the error to write to a output URI twice using xsl:try/xsl:catch, given the stylesheet

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:math="http://www.w3.org/2005/xpath-functions/math"
    xmlns:err="http://www.w3.org/2005/xqt-errors"
    exclude-result-prefixes="xs math"
    version="3.0">

    <xsl:mode streamable="yes"/>

    <xsl:output method="text"/>
    <xsl:strip-space elements="*"/>

    <xsl:template match="/">
        <xsl:apply-templates select="copy-of(root/record)" mode="result"/>
    </xsl:template>

    <xsl:template match="record" mode="result">
        <xsl:try>
            <xsl:result-document href="{fname}.txt" method="text">
                <xsl:value-of select="* except fname" separator=","/>
            </xsl:result-document>
            <xsl:catch errors="err:XTDE1490">
                <xsl:message select="'Attempt to write more than once to ', fname"/>
            </xsl:catch>
        </xsl:try>
    </xsl:template>

</xsl:stylesheet>

and an input like

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <record>
        <fname>result1</fname>
        <foo>1</foo>
        <bar>a</bar>
    </record>
    <record>
        <fname>result2</fname>
        <foo>2</foo>
        <bar>b</bar>
    </record>
    <record>
        <fname>result1</fname>
        <foo>1</foo>
        <bar>a</bar>
    </record>
</root>

Saxon 9.8 EE processes the input with streaming and writes two result files while catching the error when trying to write a second time to result1.txt when processing the third record.

As for @MichaelKay's comment about the implementation dependency as to which duplicates will be caught, I agree with that, but if it matters to avoid that then we can simply replace

<xsl:template match="/">
    <xsl:apply-templates select="copy-of(root/record)" mode="result"/>
</xsl:template>

with a use of xsl:iterate

<xsl:template match="/">
    <xsl:iterate select="root/record">
       <xsl:apply-templates select="copy-of()" mode="result"/>
    </xsl:iterate>
</xsl:template>

that way I think sequential processing is done.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • 1
    And note that since the order of processing isn't defined, it's implementation-dependent which of the duplicates will be caught this way. – Michael Kay Aug 15 '17 at 10:41
  • @MichaelKay, I didn't think about that but I think using `xsl:iterate` can avoid that problem if needed. – Martin Honnen Aug 16 '17 at 17:23
2

A trick you can use with Saxon is to make the URI unique by adding a query part href="{fname}.txt?n={position()}, and then strip this off in the OutputURIResolver.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164