15

For each "agency" node I need to find the "stmt" elements that have the same key1, key2, key3 values and output just one "stmt" node with the "comm" and "prem" values summed together. For any "stmt" elements within that "agency" that don't match any other "stmt" elements based on key1, key2 and key3 I need to output them as is. So after transformation the first "agency" node would only have two "stmt" nodes (one summed) and the second "agency" node would be passed as is because the keys don't match. XSLT 1.0 or 2.0 solutions are ok...though my stylesheet is currently 1.0. Note that the agency nodes could have any number of "stmt" elements that have matching keys which need to be grouped and summed and any number that don't.

<statement>
<agency>
    <stmt>
        <key1>1234</key1>
        <key2>ABC</key2>
        <key3>15.000</key3>
        <comm>75.00</comm>
        <prem>100.00</prem>
    </stmt>
    <stmt>
        <key1>1234</key1>
        <key2>ABC</key2>
        <key3>15.000</key3>
        <comm>25.00</comm>
        <prem>200.00</prem>
    </stmt>
    <stmt>
        <key1>1234</key1>
        <key2>ABC</key2>
        <key3>17.50</key3>
        <comm>25.00</comm>
        <prem>100.00</prem>
    </stmt>
</agency>
<agency>
    <stmt>
        <key1>5678</key1>
        <key2>DEF</key2>
        <key3>15.000</key3>
        <comm>10.00</comm>
        <prem>20.00</prem>
    </stmt>
    <stmt>
        <key1>5678</key1>
        <key2>DEF</key2>
        <key3>17.000</key3>
        <comm>15.00</comm>
        <prem>12.00</prem>
    </stmt>
</agency>

johkar
  • 435
  • 3
  • 8
  • 12

3 Answers3

16

And an XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 exclude-result-prefixes="xs"
 >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="node()|@*">
   <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="agency">
  <agency>
   <xsl:for-each-group select="stmt" group-by=
    "concat(key1, '+', key2, '+', key3)">

    <stmt>
      <xsl:copy-of select=
       "current-group()[1]/*[starts-with(name(),'key')]"/>

       <comm>
         <xsl:value-of select="sum(current-group()/comm)"/>
       </comm>
       <prem>
         <xsl:value-of select="sum(current-group()/prem)"/>
       </prem>
    </stmt>
   </xsl:for-each-group>
  </agency>
 </xsl:template>
</xsl:stylesheet>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • 1
    The `concat(key1,key2,key3)` is gonna fail in certain cases, for instance `key1="1A" key2="B" key3="1.000"` and `key1="1" key2="AB" key3="1.000"`... I feel that concatenating strings without intimate knowledge of their contents (or restriction thereof) is wrong. – Lucero May 04 '10 at 21:10
  • @Lucero: Thanks again, there is nothing wrong with the concat -- it was something slipping from me -- I'm feeling so sleepy the whole day today -- which is now corrected. Please, do let me know if the correction satisfies you. This correction is something typical in such kind of solutions. – Dimitre Novatchev May 04 '10 at 21:49
  • @Dimitre, in contrast to the other `concat` issue the `+` isn't a suitable separator here, since the XML data may theoretically very well have key strings with `+` in them - think of `key1="1+" key2="2"` and `key1="1" key2="+2"`. So my saying is that you should only concat when you know that the separator will never be part of the concatenated data. – Lucero May 04 '10 at 23:06
  • @Lucero: While this is in principle true, people who use this method are well aware of the possible problem. It is only them who know the value space of their data and they usually can choose in a well-informed manner. All solutions at xslt-related forums use the `"|"` as the breaking string, although people know that in some cases this might not be a good choice. Anyway, thanks for your insisteent reminder, although this isn't something new. – Dimitre Novatchev May 05 '10 at 00:54
  • @Lucero: (Cont.): I am using `"+"` consistently, because I believe it (symbolically) expresses the nature of the concatenation operation. If this issue is really that important to you, why don't you use whatever you consider a really rare string? Something like: `'!+|@#$%^*()`'. Anyway, thanks for your insistent reminder, although this isn't something new. – Dimitre Novatchev May 05 '10 at 01:02
  • 1
    @Dimitre, you wrote "people who use this method are well aware of the possible problem". On a site like SO where the person asking the question as well as persons searching the site not know the technique, I feel that it is important to make the readers aware of any limitations or things to keep in mind when using a specific solution. I was only trying to point this out. – Lucero May 05 '10 at 08:34
9

In XSLT 1.0 use the Muenchian method for grouping (with compound key).

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:key name="kStmtByKeys" match="stmt"
      use="concat(generate-id(..), key1, '+', key2, '+', key3)"/>

 <xsl:template match="node()|@*">
   <xsl:copy>
    <xsl:apply-templates select="node()|@*"/>
   </xsl:copy>
 </xsl:template>

 <xsl:template match="agency">
   <agency>
    <xsl:for-each select=
     "stmt[generate-id()
          =
           generate-id(key('kStmtByKeys',
                           concat(generate-id(..), key1, '+', key2, '+', key3)
                           )[1]
                       )
           ]
     ">
      <xsl:variable name="vkeyGroup" select=
       "key('kStmtByKeys', concat(generate-id(..), key1, '+', key2, '+', key3))"/>

     <stmt>
      <xsl:copy-of select="*[starts-with(name(), 'key')]"/>
      <comm>
       <xsl:value-of select="sum($vkeyGroup/comm)"/>
      </comm>
      <prem>
       <xsl:value-of select="sum($vkeyGroup/prem)"/>
      </prem>
     </stmt>
    </xsl:for-each>
   </agency>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document, produces the wanted result:

<statement>
    <agency>
        <stmt>
            <key1>1234</key1>
            <key2>ABC</key2>
            <key3>15.000</key3>
            <comm>100</comm>
            <prem>300</prem>
        </stmt>
        <stmt>
            <key1>1234</key1>
            <key2>ABC</key2>
            <key3>17.50</key3>
            <comm>25</comm>
            <prem>100</prem>
        </stmt>
    </agency>
    <agency>
        <stmt>
            <key1>5678</key1>
            <key2>DEF</key2>
            <key3>15.000</key3>
            <comm>10</comm>
            <prem>20</prem>
        </stmt>
        <stmt>
            <key1>5678</key1>
            <key2>DEF</key2>
            <key3>17.000</key3>
            <comm>15</comm>
            <prem>12</prem>
        </stmt>
    </agency>
</statement>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • If I understood the question correctly, you solution is broken when another agency has `stmt` nodes with the same keys. To me it seems that since there are multiple agencies the muenchian method with the global key isn't going to work. – Lucero May 04 '10 at 20:56
  • @Lucero: A good observation, thanks. This is now corrected and I am still using the Muenchian method with a compond key. – Dimitre Novatchev May 04 '10 at 21:05
  • Hm, is this way of generating a compound key guaranteed to give the wanted results in all situations? if a key was `concat('1', '23')` and another was `concat('12', '3')` (you get the idea) this may produce problems depending on the input document and the XSLT processor. – Lucero May 04 '10 at 21:08
  • Thank you both for the detailed answers and the pitfalls. Concatination would work for my data. I'll look over these options closer to determine the optimum for my current and future data. – johkar May 04 '10 at 21:27
  • @Lucero: Do you notice the "breaking" `'+'` in the arguments to `concat()`? *This* is what guarantees avoiding any conflicts -- of course we must be sure to select a string that is never going to be ending and/or beginning of the true data values that are concatenated to form the key. – Dimitre Novatchev May 04 '10 at 21:53
  • 1
    @Dimitre, yes, I saw it, but I didn't have the exact spec of the generate-id() function output at hand, which is also why worte this as a question ("in all situations?"). But you're right, the "+" character is not allowed as being part of a generated ID, which makes it a suitable separator here. http://www.w3.org/TR/xslt#function-generate-id – Lucero May 04 '10 at 23:00
1
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:msxsl="urn:schemas-microsoft-com:xslt" exclude-result-prefixes="msxsl">
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/|*">
        <xsl:copy>
            <xsl:apply-templates select="*" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="stmt">
        <xsl:variable name="stmtGroup" select="../stmt[(key1=current()/key1) and (key2=current()/key2) and (key3=current()/key3)]" />
        <xsl:if test="generate-id()=generate-id($stmtGroup[1])">
            <xsl:copy>
                <key1>
                    <xsl:value-of select="key1"/>
                </key1>
                <key2>
                    <xsl:value-of select="key2"/>
                </key2>
                <key3>
                    <xsl:value-of select="key3"/>
                </key3>
                <comm>
                    <xsl:value-of select="format-number(sum($stmtGroup/comm), '#.00')"/>
                </comm>
                <prem>
                    <xsl:value-of select="format-number(sum($stmtGroup/prem), '#.00')"/>
                </prem>
            </xsl:copy>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>
Lucero
  • 59,176
  • 9
  • 122
  • 152