Oh no, not again a "XSLT 1.0 finding duplicates" quest. But I really mean it

Question

Here's my humble XML file:

<choice>
    <question>
        <text>one</text>
        <answer>
            <text>2</text>
        </answer>
        <answer>
            <text>2</text>
        </answer>
    </question>
    <question>
        <text>two</text>
        <answer>
            <text>d</text>
        </answer>
    </question>
    <question>
        <text>three</text>
        <answer>
            <text>1</text>
        </answer>
        <answer>
            <text>2</text>
        </answer>
    </question>
</choice>

And this is what I tried to find out if there's duplicate text in "question":

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    version="1.0">
    <xsl:template match="/choice">
        <xsl:variable name="ok" select="count(question/text)=count(question/text[not(.=following::text)])"/>
        <xsl:copy-of select="$ok"/>
        <xsl:if test="not($ok)">
            <xsl:message terminate="yes">
                Error: Duplicate Question
            </xsl:message>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

Works fine - but how do I find out if there're duplicates in the answer-sections (question one in this example - duplicate "2") ?

Sorry for bothering but I'm really stuck here...

This may help: http://stackoverflow.com/questions/10216035/how-to-check-duplicate-element-values-using-xslt — kjhughes, Mar 03 '16 at 18:18
Please show the expected output 1) in case there are duplicates and 2) in case there are no duplicates. Thanks. — Mathias Müller, Mar 03 '16 at 18:53
@MathiasMüller: I'm just interested in a boolean variable containing either a "false" if there're duplicates or a "true" if there're none. In the example I just quit ungraceful after detecting that there's a duplicate. I'm neither interested in the number of duplicates nor in their value. — Denis Giffeler, Mar 05 '16 at 12:56
@kjhughes: Yes, I've seen that example and the solutions published by Dimitre Novatchev are almost always spot-on. But in this slightly different case I think that solution doesn't fit. — Denis Giffeler, Mar 05 '16 at 13:03

zx485 · Accepted Answer · 2016-03-14T09:20:40.810

I extended your test case by one question, resulting in the following XML

<choice>
    <question>
        <text>one</text>
        <answer>
            <text>2</text>
        </answer>
        <answer>
            <text>2</text>
        </answer>
    </question>
    <question>
        <text>two</text>
        <answer>
            <text>d</text>
        </answer>
    </question>
    <question>
        <text>three</text>
        <answer>
            <text>1</text>
        </answer>
        <answer>
            <text>2</text>
        </answer>
    </question>
    <question>
        <text>three</text>
        <answer>
            <text>1</text>
        </answer>
        <answer>
            <text>d</text>
        </answer>
    </question>
</choice>

The following XSLT isolates all duplicates. <for-each> was necessary for keeping track of the preceding-siblings (and I had to half the position number just for the output (which is not necessary for functionality)):

  <xsl:template match="/choice/question"> 
    <xsl:variable name="quesPos" select="position() div 2" />
    <xsl:for-each select="answer">
      <xsl:variable name="txt" select="text/text()" />
      <xsl:variable name="answPos" select="position()" />
      <xsl:for-each select="../preceding-sibling::*/answer">
        <xsl:if test="text/text() = $txt">
          <dup>
            <xsl:value-of select="concat('question[',$quesPos,']/answer[',$answPos,'] = ',$txt,' is a duplicate')" />
          </dup><xsl:text>&#10;</xsl:text>
        </xsl:if>
      </xsl:for-each> 
    </xsl:for-each> 
  </xsl:template>

The result of this template is

<?xml version="1.0"?>
<dup>question[3]/answer[2] = 2 is a duplicate</dup>
<dup>question[3]/answer[2] = 2 is a duplicate</dup>
<dup>question[4]/answer[1] = 1 is a duplicate</dup>
<dup>question[4]/answer[2] = d is a duplicate</dup>

Replacing the section inside <xsl:if> gives you the option of doing anything you like.

So just for the raw XSLT isolating the duplicates remove the vars except txt and everything inside xsl:if.

A second approach - which will probably be faster - is using xsl:key for indexing (but without position() information - if you need it, move the predicate out of the for-each). This is called Muenchian Grouping.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">
  <xsl:key name="answers" match="answer" use="text/text()"/>

  <xsl:template match="/choice"> 
    <xsl:for-each select="question/answer[not(generate-id() = generate-id(key('answers',text/text())[1]))]">
      <dup>
        <xsl:value-of select="concat(name(),'[',generate-id(),'] = ',text/text(),' is a duplicate')" />
      </dup><xsl:text>&#10;</xsl:text>     
    </xsl:for-each>
  </xsl:template> 
</xsl:stylesheet>

The nested for-each approach works fine, even if the performance with larger datasets ist possibly not the best. It would've been nice to have a solution that combines both aspects by finding duplicates in nested structures. But maybe that is something better left to be done under XSLT 2.0. — Denis Giffeler, Mar 05 '16 at 13:09
Answer accepted - on behalf of all those lost souls looking for duplicates: Thank you! — Denis Giffeler, Mar 05 '16 at 17:00

Oh no, not again a "XSLT 1.0 finding duplicates" quest. But I really mean it

1 Answers1