0

There are many questions about how to remove duplicate elements when you can group those elements by a certain attribute or value, however, in my case the attributes are being dynamically generated in the XSLT already and I don't want to have to program in every attribute for every element to use as a grouping key.

How do you remove duplicate elements without knowing in advance their attributes? So far, I've tried using generate-id() on each element and grouping by that, but the problem is generate-id isn't generating the same ID for elements with the same attributes:

<xsl:template match="root">
    <xsl:variable name="tempIds">
        <xsl:for-each select="./*>
            <xsl:copy>
                <xsl:copy-of select="@*"/>
                <xsl:attribute name="tempID">
                    <xsl:value-of select="generate-id(.)"/>
                </xsl:attribute>
                <xsl:copy-of select="node()"/>
            </xsl:copy>
        </xsl:for-each>
    </xsl:variable>
    <xsl:for-each-group select="$tempIds" group-by="@tempID">
        <xsl:sequence select="."/>
    </xsl:for-each-group>
</xsl:template>

Test data:

<root>
    <child1>
        <etc/>
    </child1>
    <dynamicElement1 a="2" b="3"/>
    <dynamicElement2 c="3" d="4"/>
    <dynamicElement2 c="3" d="5"/>
    <dynamicElement1 a="2" b="3"/>
</root>

With the end result being only one of the two dynamicElement1 elements remaining:

<root>
    <child1>
        <etc/>
    </child1>
    <dynamicElement1 a="2" b="3"/>
    <dynamicElement2 c="3" d="4"/>
    <dynamicElement2 c="3" d="5"/>
</root>
CC Inc
  • 5,842
  • 3
  • 33
  • 64
  • Which XSLT processor do you use? And which result do you want for the test data? Sounds as if XSLT 3 with `xsl:for-each-group select="*" composite="yes" group-by="@*"` might be an option. But I don't see how an element can have two `c` attributes, like your `dynamicElement2` elements. – Martin Honnen Jun 02 '18 at 17:07
  • @MartinHonnen My apologies, that was a typo. My stylesheet uses XSLT 2.0, but I can create another stylesheet in 3.0 that'll chain onto the other one if that'll be easier. I'm using SAXON EE as my processor – CC Inc Jun 02 '18 at 17:12
  • *"the problem is generate-id isn't generating the same ID for elements with the same attributes"* Hm, where did you get the notion from that that would be the case? – Tomalak Jun 02 '18 at 17:13
  • Well the whole point of `generate-id()` is to generate unique IDs, so this shouldn't be a surprise. – Tomalak Jun 02 '18 at 17:29
  • 1
    @Tomalak I just saw [this](https://stackoverflow.com/a/19660916/1482644) answer. I misunderstood and thought that generate-id would return the same value for a node that has a similar value and attributes, but I see now that it also takes the context into consideration. – CC Inc Jun 02 '18 at 17:33

2 Answers2

3

In XSLT 3 as shown in https://xsltfiddle.liberty-development.net/pPqsHTi you can use a composite key of all attributes with e.g.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output indent="yes"/>

  <xsl:template match="root">
      <xsl:copy>
          <xsl:for-each-group select="*" composite="yes" group-by="@*">
              <xsl:sequence select="."/>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Note that technically attributes are not ordered so it might be safer to group by a sort of the attributes by node-name() or similar, as done with XSLT 3 without higher-order functions in https://xsltfiddle.liberty-development.net/pPqsHTi/2

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:mf="http://example.com/mf"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:output indent="yes"/>

  <xsl:function name="mf:node-sort" as="node()*">
      <xsl:param name="input-nodes" as="node()*"/>
      <xsl:perform-sort select="$input-nodes">
          <xsl:sort select="namespace-uri()"/>
          <xsl:sort select="local-name()"/>
      </xsl:perform-sort>
  </xsl:function>

  <xsl:template match="root">
      <xsl:copy>
          <xsl:for-each-group select="*" composite="yes" group-by="mf:node-sort(@*)">
              <xsl:sequence select="."/>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

or as you could do with Saxon EE simply with

<xsl:template match="root">
    <xsl:copy>
        <xsl:for-each-group select="*" composite="yes" group-by="sort(@*, (), function($att) { namespace-uri($att), local-name($att) })">
            <xsl:sequence select="."/>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • This works great, thanks! Just for my knowledge, any idea why my `generate-id` solution didn't work? – CC Inc Jun 02 '18 at 17:22
  • 1
    @CCInc, `generate-id` generates a distinct id for any distinct node, based on node identity. So two nodes in the same document never have the same id. – Martin Honnen Jun 02 '18 at 17:27
  • 1
    I would definitely not rely on the attributes being in the same order on two different elements. Technically using `string(node-name())` as a sort key is also unsafe, because it relies on the same namespace bindings being in scope for both elements. Safer is to use the composite sort key `(namespace-uri($att), local-name($att))`. – Michael Kay Jun 02 '18 at 22:59
  • @MichaelKay, you are right, I have adapted the code samples to incorporate your suggestion. – Martin Honnen Jun 03 '18 at 11:06
0
<xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="root/*[@a= following-sibling::*/@a]|root/*[@c= following-sibling::*/@c and @d= following-sibling::*/@d]"/>
You may try this
imran
  • 461
  • 4
  • 8