1

I'm a beginner in XSLT and figured out that I cannot just add up numbers to a variable and change its value in any way.

I have a XML document with a list of numbers i need to add up until the element matches a specific attribute value, then print that number reset it to 0 and continue adding up the rest until i see that specific attribute again.

For example i have this XML:

<list>
 <entry>
  <field type="num" value="189.5" />
 </entry>
 <entry>
  <field type="num" value="1.5" />
 </entry>
 <entry>
  <field type="summary" />
 </entry>
 <entry>
  <field type="num" value="9.5" />
 </entry>
 <entry>
  <field type="num" value="11" />
 </entry>
 <entry>
  <field type="num" value="10" />
 </entry>
 <entry>
  <field type="summary" />
 </entry>
</list>

Now i want my XSLT to print this:

189.5
1.5
#191#
9.5
11
10
#30.5#

I have read that i can do that by using sum() with conditions. I know how to use for-each and point to the elements relatively and iam also able to use sum() by simply summarizing all having type=num, but how to sum only first num until type=summary comes up, then next sum only from last type=summary until the next one?

I would expect something like this:

<xsl:for-each select="list/entry">
 <xsl:if test="field[@type='summary']">
  <!-- we are now at a type=summary element, now sum up -->
  #<xsl:value-of select="sum(WHAT_TO_PUT_HERE?)" />#
 </xsl:if>
 <xsl:if test="field[@type='num']">
  <xsl:value-of select="field/@value" />
 </xsl:if>
</xsl:for-each>

Appreciate any help.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
NovumCoder
  • 4,349
  • 9
  • 43
  • 58

4 Answers4

6

I. Here is a simple, forward-only solution -- do note that no reverse axis is used and the time complexity is just O(N) and the space complexity is just O(1).

This is probably the simplest and fastest of all presented solutions:

No monstrous complexity or grouping is required at all ...

No variables, no keys (and no space taken for caching key->values), no sum() ...

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>

  <xsl:template match="/*"><xsl:apply-templates select="*[1]"/></xsl:template>

  <xsl:template match="entry[field/@type = 'num']">
    <xsl:param name="pAccum" select="0"/>
    <xsl:value-of select="concat(field/@value, '&#xA;')"/>
    <xsl:apply-templates select="following-sibling::entry[1]">
      <xsl:with-param name="pAccum" select="$pAccum+field/@value"/>
    </xsl:apply-templates>
  </xsl:template>

  <xsl:template match="entry[field/@type = 'summary']">
    <xsl:param name="pAccum" select="0"/>
    <xsl:value-of select="concat('#', $pAccum, '#&#xA;')"/>  
    <xsl:apply-templates select="following-sibling::entry[1]"/>
  </xsl:template>
</xsl:stylesheet>

This is an example of a streaming transformation -- it doesn't require the complete XML document tree to be present in memory and can be used to process documents of indefinite or infinite length.

When the transformation is applied on the provided source XML document:

<list>
    <entry>
        <field type="num" value="189.5" />
    </entry>
    <entry>
        <field type="num" value="1.5" />
    </entry>
    <entry>
        <field type="summary" />
    </entry>
    <entry>
        <field type="num" value="9.5" />
    </entry>
    <entry>
        <field type="num" value="11" />
    </entry>
    <entry>
        <field type="num" value="10" />
    </entry>
    <entry>
        <field type="summary" />
    </entry>
</list>

the wanted, correct result is produced:

189.5
1.5
#191#
9.5
11
10
#30.5#

II. Update

The transformation above when run on sufficiently-big XML documents and with XSLT processors that don't optimize tail-recursion, causes stack overflow, due to a long chain of <xsl:apply-templates>

Below is another transformation, which doesn't cause stack overflow even with extremely big XML documents. Again, no reverse axes, no keys, no "grouping", no conditional instructions, no count(), no <xsl:variable> ...

And, most importantly, compared with the "efficient", key-based Muenchian grouping, this transformation takes only 61% of the time of the latter, when run on an XML document having 105 000 (105 thousand) lines:

<xsl:stylesheet version="1.0"  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output method="text"/>

 <xsl:template match="/*">
  <xsl:apply-templates select=
  "*[1] | entry[field/@type = 'summary']/following-sibling::*[1]"/>
 </xsl:template>

  <xsl:template match="entry[field/@type = 'num']">
    <xsl:param name="pAccum" select="0"/>

    <xsl:value-of select="concat(field/@value, '&#xA;')"/>

    <xsl:apply-templates select="following-sibling::entry[1]">
        <xsl:with-param name="pAccum" select="$pAccum+field/@value"/>
    </xsl:apply-templates>
  </xsl:template>

  <xsl:template match="entry[field/@type = 'summary']">
    <xsl:param name="pAccum" select="0"/>

    <xsl:value-of select="concat('#', $pAccum, '#&#xA;')"/>
 </xsl:template>
</xsl:stylesheet>

Additionally, this transformation can be speeded to take less than 50% (that is, make it more than twice as fast) of the time taken by the Muenchian grouping transformation, by replacing every element name by just *

A lesson for us all to learn: A non-key solution sometimes can be more efficient than a key-based one.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Great solution. One question regarding qAccum param. I though we cannot use "variables" and keep adding up on them. I tried with xsl:variable and blocked me using it. What is the difference on xsl:param and xsl:with-param then? – NovumCoder Apr 21 '15 at 06:31
  • @NovumCoder As short explanation - the value of an `` _can't_ be changed, but the value of an `` is only a _default_ value and can be overriden when the template is called. Here, initially the param has the value `0` when matching the first `num`. For the following field, the template is called with the current value of `pAccum` (0 + 189), and for the next field - which is the 1st `summary` - with 191 (189 + 1.5). More details e.g. here: http://www.xml.com/lpt/a/726 – matthias_h Apr 21 '15 at 07:04
  • 1
    @NovumCoder, the same-named parameter that is passed to template2 from template1 is not the same parameter (as object) that was passed to template1 -- they just happen to have the same name. Therefore, the principle of immutability of variables is not violated. BTW, you have accepted a solution which can be hundreds or thousands of times slower (with big XML documents) than this solution or the Muenchian grouping solution, which is still less efficient than this solution. – Dimitre Novatchev Apr 21 '15 at 13:31
2

Just as a different solution to the grouping suggested as comment - you could also use match patterns to get the sums:

<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="xml" omit-xml-declaration="yes" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<xsl:template match="field[@type='num']">
  <xsl:value-of select="@value"/>
<xsl:text>&#x0A;</xsl:text>
  </xsl:template>
  <xsl:template match="entry[field[@type='summary']]">
  <xsl:variable name="sumCount" select="count(preceding-sibling::entry[field[@type='summary']])"/>
     <xsl:text>#</xsl:text>
     <xsl:value-of select="sum(preceding-sibling::entry[count(preceding-sibling::entry[field[@type='summary']]) = $sumCount]/field[@type='num']/@value)"/>
    <xsl:text>#&#x0A;</xsl:text>     
  </xsl:template>
</xsl:transform>

When applied to your input XML this produces the output

189.5
1.5
#191#
9.5
11
10
#30.5#

The template matching field[@type='num'] prints the value and adds a newline, and the template matching entry[field[@type='summary']] uses the variable

<xsl:variable name="sumCount" select="count(preceding-sibling::entry[field[@type='summary']])"/>

to check how many previous fields of the type summary occured. Then only the sum of all values of entries of the type num with the same amount of preceding summary fields is printed:

<xsl:value-of select="sum(preceding-sibling::entry[
                      count(preceding-sibling::entry[field[@type='summary']]) = $sumCount
                      ]/field[@type='num']/@value)"/>

Update: To explain in more detail how this works as requested: In the template matching entry[field[@type='summary']] the variable sumCount counts all previous entries that have a field of type summary:

count(preceding-sibling::entry[field[@type='summary']])

So when the template matches the first summary field, the value of sumCount is 0, and when matching the second summary field, sumCount is 1.
The second line using the sum function

sum(
    preceding-sibling::entry
     [
      count(preceding-sibling::entry[field[@type='summary']]) = 
      $sumCount
     ]
     /field[@type='num']/@value
   )

sums all field[@type='num']/@value for all previous (preceding) entries that have the same amount of previous fields of type summary as the current field of type summary:

count(preceding-sibling::entry[field[@type='summary']]) = $sumCount

So when the second summary is matched, only the values of the num fields with the values 9.5, 10 and 11 will be summarized as they have the same amount of previous summary fields as the current summary field.
For the num fields with the values 189.5 and 1.5,

count(preceding-sibling::entry[field[@type='summary']]) 

is 0, so these fields are omitted in the sum function.

matthias_h
  • 11,356
  • 9
  • 22
  • 40
  • works great. but to be honest i have no idea why it works. can you explain the second line with sum function and the way you use entry count kinda assigning the $sumCount. What does it do? – NovumCoder Apr 19 '15 at 10:38
  • @NovumCoder Glad I was able to help, and I've just updated my answer with some more detailed explanation. In case there's still something unclear just let me know. – matthias_h Apr 19 '15 at 11:17
  • Wait, it just stopped working for another case. Keep in mind, that the number of type=num can be any. This time i have two nums, then a summary, then again two nums and a summary. Now first summary simply gives me ",00" and second summary gives me the value from last type=num only. – NovumCoder Apr 19 '15 at 12:45
  • @NovumCoder That shouldn't be an issue. I've just saved the original XML and XSLT in this demo: http://xsltransform.net/nc4NzQG If one of the nums for the second summary is deleted, it still works as intended. You can add your XML there, update the Demo and let me know the URL so I can check why it's not working as intended. – matthias_h Apr 19 '15 at 13:03
  • 1
    This is ON(n^2) in the general case. And too-complicated for this kind of problem. I tested the two solutions -- yours and mine -- with a moderate XML document (around 1000) lines and even in this mild case the forward-only solution ran 100 times faster. With a document with about 5000 lines the forward-only solution was 4500 times faster. I got these results running the transformations with MSXML4. I believe @NovumCoder could consider trying the forward-only solution, too :) – Dimitre Novatchev Apr 19 '15 at 16:46
  • 2
    @DimitreNovatchev Thanks for testing all provided solutions and sharing the results. Just as explanation - I didn't intend to provide the _best_ solution here, only one _possible_ different from the proposed grouping suggested by michael.hor257k (sometimes I just like to solve XSLT issues using match patterns, regardless of possible efficiency issues). Though not able to confirm michael's statement that your approach crashes the proc. (worked for me on all proc. on xsltransform.net), I trust your experience and think re efficiency your and michael's solutions are better and +1 both. – matthias_h Apr 19 '15 at 17:54
  • 2
    @matthias_h, You are welcome. I think the main value of SO is for everybody to learn and have fun. I hope my solution and comments helped us do that today :) – Dimitre Novatchev Apr 19 '15 at 18:09
  • @matthias_h, In the Update to my answer I provided a transformation which doesn't cause stack-overflow -- even when run with XSLT processors that don't optimize tail-recursion. – Dimitre Novatchev Apr 20 '15 at 01:23
1

You need a variation on Muenchian grouping. Start by defining a key as:

<xsl:key name="numbers" match="entry[field/@type='num']" use="generate-id(following-sibling::entry[field/@type='summary'][1])" />

then use:

#<xsl:value-of select="sum(key('numbers', generate-id())/field/@value)" />#

to sum the numbers in the current group.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • For everyone who cares about performance: With an XML document of about 1000 lines, this solution is more than 3.5 times slower than the simpler, forward-only solution. With an XML document of about 5000 lines, this solution is 5 times slower than the simpler, forward-only solution – Dimitre Novatchev Apr 19 '15 at 16:51
  • It is unwise to make general assumptions about performance from testing it with a specific processor. Running a test with an XML document of about 5000 lines, using Xalan and Saxon 6.5 processors, I could not tell a difference between this method and the "simpler, forward-only solution". The execution was instant in both cases. I do not have a tool to measure the elapsed time for these two processors. I do have such tool when running libxslt: however I could not make the comparison, because **the "simpler, forward-only solution" crashed the processor every time**. – michael.hor257k Apr 19 '15 at 17:11
  • michael.hor257k, Saxon has this command-line option for measuring and reporting execution time: -t . MSXSL also has a similar option. With both MSXML4 and Saxon I get similar results for the performance of the two solutions. On an XSLT processor that doesn't optimize tail-recursion, the forward-only transformation can cause stack-overflow for big documents. A DVC (Divide and Conquer) variation of this transformation works without problem on any such processor, preserving the simplicity and efficiency of the original transformation. – Dimitre Novatchev Apr 19 '15 at 17:45
  • 2
    michael.hor257k, **See the Update in my answer**. It provides a transformation that doesn't cause stack overflow, doesn't reference any reverse axes, doesn't use any keys, and (with Saxon 6.5.3) takes only 61% of the time that a Muenchian grouping (using the key definition from your answer) takes -- with 105 000 (105 thousand) - lines long XML document. Using keys is not always the most efficient solution! – Dimitre Novatchev Apr 20 '15 at 01:24
0

too late to the party and almost the same as matthias_h did:

<?xml version="1.0" encoding="utf-8"?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output method="text"/>

  <xsl:template match="//field[@type='num']">
    <xsl:value-of select="concat(@value,'&#x0a;')"/>
  </xsl:template>

  <xsl:template match="//field[@type='summary']">
    <xsl:variable name="prevSumCnt" select="count(preceding::field[@type='summary'])"/>
    <xsl:variable name="sum" select="sum(preceding::field[count(preceding::field[@type='summary'])=$prevSumCnt]/@value)"/>
    <xsl:value-of select="concat('#',$sum,'#&#x0a;')"/>
  </xsl:template>

  <xsl:template match="text()"/>
</xsl:transform>

the idea is to sum all fields that have the same number of summary-fields before them than the actual summary-field...

leu
  • 2,051
  • 2
  • 12
  • 25