6

I have a terrible piece of XML that I need to process through BizTalk, and I have managed to normalise it into this example below. I am no XSLT ninja, but between the web and the VS2010 debugger, I can find my way around XSL.

I now need a clever bit of XSLT to "weed out" the duplicate elements and only keep the latest ones, as decided by the date in the ValidFromDate attribute.

The ValidFromDate attribute is of the XSD:Date type.

<SomeData>
  <A ValidFromDate="2011-12-01">A_1</A>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B CalidFromDate="2011-12-03">B_1</B>
  <B ValidFromDate="2012-01-17">B_2</B>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
  <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>

After a transformation I'd like to only keep these lines:

<SomeData>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
</SomeData>

Any clues as to how I put that XSL together? I've emptied the internet trying to look for a solution, and I have tried a lot of clever XSL sorting scripts, but none I felt took me in the right direction.

LarsWA
  • 571
  • 5
  • 15
  • Also ... as this is going to be called from a BizTalk map, and thus by .NET I am limited to XSLT 1.0 ... – LarsWA Jan 26 '12 at 18:24
  • 2
    Maybe `C_1` instead of `C_2`? – Kirill Polishchuk Jan 26 '12 at 20:00
  • Yes off course ... thanks. Edited this in my quest. – LarsWA Jan 27 '12 at 09:05
  • First of all ... A LOT of really great solutions. I get better at my XSLT mojo reading through all of them. I didn't have time to try out ALL of them, and there are others than my selected solution that would have done the trick. – LarsWA Jan 27 '12 at 09:08

6 Answers6

3

The optimal solution for this problem with Xslt 1.0 would be to use Muenchian grouping. (Given that the elements are already sorted by the ValidFromDate attribute) the following stylesheet should do the trick:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:key name="element-key" match="/SomeData/*" use="name()" />

  <xsl:template match="/SomeData">
    <xsl:copy>
      <xsl:for-each select="*[generate-id() = generate-id(key('element-key', name()))]">
        <xsl:copy-of select="(. | following-sibling::*[name() = name(current())])[last()]" />
      </xsl:for-each>
    </xsl:copy>
  </xsl:template>

</xsl:stylesheet>

Here is the result I got when running it against your sample Xml:

<?xml version="1.0" encoding="utf-8"?>
<SomeData>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>
Pawel
  • 31,342
  • 4
  • 73
  • 104
2

The following stylesheet produces the correct result without any reliance on the input order:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>
    <xsl:key name="byName" match="/SomeData/*" use="name()"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="SomeData">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:for-each select="*[generate-id()=
                                    generate-id(key('byName', name())[1])]">
                <xsl:apply-templates select="key('byName', name())" mode="out">
                    <xsl:sort select="translate(@ValidFromDate, '-', '')" 
                              data-type="number" order="descending"/>
                </xsl:apply-templates>
            </xsl:for-each>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="SomeData/*" mode="out">
        <xsl:if test="position()=1">
            <xsl:apply-templates select="."/>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

Output:

<SomeData>
   <A ValidFromDate="2012-01-19">A_2</A>
   <B ValidFromDate="2012-01-19">B_3</B>
   <C ValidFromDate="2012-01-20">C_1</C>
</SomeData>

Note that the result is slightly different than what you listed as the desired output, because C_1 is actually the latest C element (i.e. the input is not already sorted). By relying on an initial sort order (and blindly following the listed expected output) the existing answers are actually incorrect.

Explanation:

  • An xsl:key groups all /SomeData/* by name()
  • The outer for-each selects the first item in each group
  • Templates are then applied to all members of that group, which are sorted by @ValidFromDate
  • A single additional template handles picking the first element out of each sorted group
  • An Identity Transform template takes care of the rest
Wayne
  • 59,728
  • 15
  • 131
  • 126
  • Hi lwburk ... thanks for pointing out the C_1 thingy. Me bad. Solution worked fine though. I went with Dimitre's solution as it was a little shorter and easier to overview and thus maintain for those that will have to do maintenance on this. – LarsWA Jan 27 '12 at 09:11
2

Based on @ValidFromDate order:

XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:key name="k" match="*" use="name()"/>

  <xsl:template match="SomeData">
    <xsl:copy>
      <xsl:apply-templates select="*[generate-id() = 
                           generate-id(key('k', name()))]"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*">
    <xsl:apply-templates select="key('k', name())" mode="a">
      <xsl:sort select="@ValidFromDate" order="descending"/>
    </xsl:apply-templates>
  </xsl:template>

  <xsl:template match="*" mode="a">
    <xsl:if test="position() = 1">
      <xsl:copy-of select="."/>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

applied on:

<SomeData>
  <A ValidFromDate="2011-12-01">A_1</A>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B CalidFromDate="2011-12-03">B_1</B>
  <B ValidFromDate="2012-01-17">B_2</B>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
  <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>

produces:

<SomeData>
  <A ValidFromDate="2012-01-19">A_2</A>
  <B ValidFromDate="2012-01-19">B_3</B>
  <C ValidFromDate="2012-01-20">C_1</C>
</SomeData>
Kirill Polishchuk
  • 54,804
  • 11
  • 122
  • 125
2

A slightly simpler and shorter XSLT 1.0 solution than that of @lwburk:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kName" match="*/*" use="name()"/>

 <xsl:template match="/">
  <xsl:apply-templates select=
   "*/*[generate-id()
       =
        generate-id(key('kName', name())[1])
       ]
   "/>
 </xsl:template>

 <xsl:template match="*/*">
  <xsl:for-each select="key('kName', name())">
   <xsl:sort select="@ValidFromDate" order="descending"/>
   <xsl:if test="position() = 1">
    <xsl:copy-of select="."/>
   </xsl:if>
  </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<SomeData>
    <A ValidFromDate="2011-12-01">A_1</A>
    <A ValidFromDate="2012-01-19">A_2</A>
    <B CalidFromDate="2011-12-03">B_1</B>
    <B ValidFromDate="2012-01-17">B_2</B>
    <B ValidFromDate="2012-01-19">B_3</B>
    <C ValidFromDate="2012-01-20">C_1</C>
    <C ValidFromDate="2011-01-20">C_2</C>
</SomeData>

the wanted, correct result is produced:

<A ValidFromDate="2012-01-19">A_2</A>
<B ValidFromDate="2012-01-19">B_3</B>
<C ValidFromDate="2012-01-20">C_1</C>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
1

Based on Pawel's answer, I made the following modification, which produces the same result:

<xsl:template match="/SomeData">
  <xsl:copy>
    <xsl:copy-of select="*[generate-id() = generate-id(key('element-key', name())[last()])]"/>
  </xsl:copy>
</xsl:template>

If they produce the same result every time, I like this because it's a little cleaner.

Community
  • 1
  • 1
Zach Young
  • 10,137
  • 4
  • 32
  • 53
1

XLST 2.0 solution without relying on input order.

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions">
    <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
    <xsl:template match="/">
        <SomeData>
            <xsl:for-each-group select="/SomeData/*" group-by="name()">
                    <xsl:for-each select="current-group()">
                        <xsl:sort select="number(substring(attribute(),1,4))" order="descending" data-type="number"/> <!-- year-->
                        <xsl:sort select="number(substring(attribute(),6,2))" order="descending" data-type="number"/> <!-- month-->
                        <xsl:sort select="number(substring(attribute(),9,2))" order="descending" data-type="number"/> <!-- date-->
                        <xsl:if test="position()=1">
                                <xsl:sequence select="."/>
                        </xsl:if>
                    </xsl:for-each>
            </xsl:for-each-group>
        </SomeData>
</xsl:template>
</xsl:stylesheet>
user47900
  • 647
  • 4
  • 6
  • Yup .. XSLT 2.0 makes it a little simpler to do this, but Microsoft haven't gotten around to implementing that yet ... Thanks though. – LarsWA Jan 27 '12 at 09:13