0

I am new in XSLT and looking for help to remove duplicates of <EMP> from an xml document on the basis of their children's combined value. From each group of elements with the same value for this, the one with highest value for AIB_Position/AIB must be output. Below is my sample xml document and the corresponding desired output.

<Row_entry>
<Employees>
    <Emp>
        <Emp_id>E1</Emp_id>
        <Emp_Name>Name1</Emp_Name>
        <Country>C1</Country>
        <AIB_Position>
            <AIB>1500</AIB>
        </AIB_Position>
    </Emp>
    <Emp>
        <Emp_id>E2</Emp_id>
        <Emp_Name>Name2</Emp_Name>
        <Country>C2</Country>
        <AIB_Position>
            <AIB>1700</AIB>
        </AIB_Position>
    </Emp>
    <Emp>
        <Emp_id>E2</Emp_id>
        <Emp_Name>Name2</Emp_Name>
        <Country>C2</Country>
        <AIB_Position>
            <AIB>1800</AIB>
        </AIB_Position>
    </Emp>
 </Employees>
</Row_entry>

Desired output(Removed duplicate Emp elements based on the combined <Emp_id>, <Emp_Name>, <Country> value):

<Row_entry>
 <Employees>
    <Emp>
        <Emp_id>E1</Emp_id>
        <Emp_Name>Name1</Emp_Name>
        <Country>C1</Country>
        <AIB_Position>
            <AIB>1500</AIB>
        </AIB_Position>
    </Emp>
    <Emp>
        <Emp_id>E2</Emp_id>
        <Emp_Name>Name2</Emp_Name>
        <Country>C2</Country>
        <AIB_Position>
            <AIB>1800</AIB>
        </AIB_Position>
    </Emp>
 </Employees>
</Row_entry>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
gud.u
  • 21
  • 6

2 Answers2

1

I think you want this (directly using the XPath 2.0 max() function):

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output indent="yes"/>

  <xsl:template match="Employees">
      <xsl:copy>
          <xsl:for-each-group select="Emp" group-by="concat(Emp_id, '+', Emp_Name, '+', Country)">
            <xsl:copy-of select="current-group()
                 [AIB_Position/AIB/number() = max(current-group()/AIB_Position/AIB/number())][1]"/>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>
</xsl:stylesheet>

And if you suspect your XSLT processor of idiocy, such as calculating the max() more than once, use this more precisely directing transformation:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xsl:output indent="yes"/>

  <xsl:template match="Employees">
      <xsl:copy>
          <xsl:for-each-group select="Emp"
           group-by="concat(Emp_id, '+', Emp_Name, '+', Country)">
            <xsl:copy-of select=
            "for $max in max(current-group()/AIB_Position/AIB/number())
              return
                current-group()[AIB_Position/AIB/number() = $max][1]"/>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
0

In XSLT 2 or later, use for-each-group, for instance in XSLT 3 with a composite grouping key, then sort each group and output the maximum value:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:output indent="yes"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="Employees">
      <xsl:copy>
          <xsl:for-each-group select="Emp" composite="yes" group-by="Emp_id, Emp_Name, Country">
              <xsl:for-each select="current-group()">
                  <xsl:sort select="AIB_Position/AIB" order="descending"/>
                  <xsl:if test="position() = 1">
                      <xsl:copy-of select="."/>
                  </xsl:if>
              </xsl:for-each>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

With an XSLT 3 processor supporting the higher order sort function you could shorten that code to use

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

    <xsl:output indent="yes"/>

    <xsl:mode on-no-match="shallow-copy"/>

    <xsl:template match="Employees">
        <xsl:copy>
            <xsl:for-each-group select="Emp" composite="yes" group-by="Emp_id, Emp_Name, Country">
                <xsl:sequence select="sort(current-group(), (), function($emp) { xs:integer($emp/AIB_Position/AIB) })[last()]"/>
            </xsl:for-each-group>
        </xsl:copy>
    </xsl:template>

</xsl:stylesheet>

https://stackoverflow.com/tags/xslt-grouping/info has some details on how to implement a composite grouping key of XSLT 3 in XSLT 2 by string-joining the components of the key, if you are limited to XSLT 2.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Thanks Martin, Its Working. – gud.u Jun 18 '19 at 14:55
  • 1
    Hi Martin, sorting -- O(N*log(N)) is not necessary to find maximum -- and there is a convenient XPath 2.0 function for this, called `max()` -- (O(N)) – Dimitre Novatchev Jun 19 '19 at 19:32
  • @DimitreNovatchev, thanks, good point, although at least for `current-group()[AIB_Position/AIB/number() = max(current-group()/AIB_Position/AIB/number())]` it will be interesting to check whether processors rewrite that to only compute the maximum value once. But as I used XSLT 3 anyway where `let` exists I should have used that. By the way, will there be a part 2 of your Pluralsight course on XSLT 3? – Martin Honnen Jun 19 '19 at 19:56
  • @MartinHonnen As pointed out earlier, and per the code, the **for expression** can be used for cases like this in XPath 2.0 – Dimitre Novatchev Jun 27 '19 at 23:48