XSLT remove duplicate and select higher value

Question

I am new in XSLT and looking for help to remove duplicates of <EMP> from an xml document on the basis of their children's combined value. From each group of elements with the same value for this, the one with highest value for AIB_Position/AIB must be output. Below is my sample xml document and the corresponding desired output.

<Row_entry>
<Employees>
    <Emp>
        <Emp_id>E1</Emp_id>
        <Emp_Name>Name1</Emp_Name>
        <Country>C1</Country>
        <AIB_Position>
            <AIB>1500</AIB>
        </AIB_Position>
    </Emp>
    <Emp>
        <Emp_id>E2</Emp_id>
        <Emp_Name>Name2</Emp_Name>
        <Country>C2</Country>
        <AIB_Position>
            <AIB>1700</AIB>
        </AIB_Position>
    </Emp>
    <Emp>
        <Emp_id>E2</Emp_id>
        <Emp_Name>Name2</Emp_Name>
        <Country>C2</Country>
        <AIB_Position>
            <AIB>1800</AIB>
        </AIB_Position>
    </Emp>
 </Employees>
</Row_entry>

Desired output(Removed duplicate Emp elements based on the combined <Emp_id>, <Emp_Name>, <Country> value):

<Row_entry>
 <Employees>
    <Emp>
        <Emp_id>E1</Emp_id>
        <Emp_Name>Name1</Emp_Name>
        <Country>C1</Country>
        <AIB_Position>
            <AIB>1500</AIB>
        </AIB_Position>
    </Emp>
    <Emp>
        <Emp_id>E2</Emp_id>
        <Emp_Name>Name2</Emp_Name>
        <Country>C2</Country>
        <AIB_Position>
            <AIB>1800</AIB>
        </AIB_Position>
    </Emp>
 </Employees>
</Row_entry>

EMP id, name and country are same and I want higher AIB value, Kindly help — gud.u, Jun 18 '19 at 09:39
Grouping is covered for instance in https://stackoverflow.com/tags/xslt-grouping/info, so start there to get an idea, then show us your attempt if you can't work it out. — Martin Honnen, Jun 18 '19 at 09:42
@Martin I have tried many approach but lack of expertise I was not able to achieve it. — gud.u, Jun 18 '19 at 09:58
Did you read and try my solution? Did it work for you? Does it solve your problem? If not, what additional difficulties did you face? — Dimitre Novatchev, Jun 29 '19 at 18:27

Dimitre Novatchev · Answer 1 · 2019-06-19T19:49:26.110

I think you want this (directly using the XPath 2.0 max() function):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output indent="yes"/>

  <xsl:template match="Employees">
      <xsl:copy>
          <xsl:for-each-group select="Emp" group-by="concat(Emp_id, '+', Emp_Name, '+', Country)">
            <xsl:copy-of select="current-group()
                 [AIB_Position/AIB/number() = max(current-group()/AIB_Position/AIB/number())][1]"/>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

And if you suspect your XSLT processor of idiocy, such as calculating the max() more than once, use this more precisely directing transformation:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output indent="yes"/>

  <xsl:template match="Employees">
      <xsl:copy>
          <xsl:for-each-group select="Emp"
           group-by="concat(Emp_id, '+', Emp_Name, '+', Country)">
            <xsl:copy-of select=
            "for $max in max(current-group()/AIB_Position/AIB/number())
              return
                current-group()[AIB_Position/AIB/number() = $max][1]"/>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

score 0 · Answer 2 · answered Jun 18 '19 at 12:27

In XSLT 2 or later, use for-each-group, for instance in XSLT 3 with a composite grouping key, then sort each group and output the maximum value:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:output indent="yes"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="Employees">
      <xsl:copy>
          <xsl:for-each-group select="Emp" composite="yes" group-by="Emp_id, Emp_Name, Country">
              <xsl:for-each select="current-group()">
                  <xsl:sort select="AIB_Position/AIB" order="descending"/>
                  <xsl:if test="position() = 1">
                      <xsl:copy-of select="."/>
                  </xsl:if>
              </xsl:for-each>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

With an XSLT 3 processor supporting the higher order sort function you could shorten that code to use

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

    <xsl:output indent="yes"/>

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:template match="Employees">
        <xsl:copy>
            <xsl:for-each-group select="Emp" composite="yes" group-by="Emp_id, Emp_Name, Country">
                <xsl:sequence select="sort(current-group(), (), function($emp) { xs:integer($emp/AIB_Position/AIB) })[last()]"/>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

https://stackoverflow.com/tags/xslt-grouping/info has some details on how to implement a composite grouping key of XSLT 3 in XSLT 2 by string-joining the components of the key, if you are limited to XSLT 2.

Hi Martin, sorting -- O(N*log(N)) is not necessary to find maximum -- and there is a convenient XPath 2.0 function for this, called `max()` -- (O(N)) — Dimitre Novatchev, Jun 19 '19 at 19:32
@DimitreNovatchev, thanks, good point, although at least for `current-group()[AIB_Position/AIB/number() = max(current-group()/AIB_Position/AIB/number())]` it will be interesting to check whether processors rewrite that to only compute the maximum value once. But as I used XSLT 3 anyway where `let` exists I should have used that. By the way, will there be a part 2 of your Pluralsight course on XSLT 3? — Martin Honnen, Jun 19 '19 at 19:56
@MartinHonnen As pointed out earlier, and per the code, the **for expression** can be used for cases like this in XPath 2.0 — Dimitre Novatchev, Jun 27 '19 at 23:48

XSLT remove duplicate and select higher value

2 Answers2