2

I have 2 example xml files

<?xml version="1.0" encoding="UTF-8"?> 
<foobar>
<foo>ONE</foo>  
<bar>a</bar> 
</foobar>

<?xml version="1.0" encoding="UTF-8"?> 
<foobar>
<foo>ONE</foo>
<foo>two</foo>  
<bar>a</bar> 
</foobar>

Desired output for the first xml is same as input. 2nd xml example is

<foobar>
<s>
<s>
<foo>ONE</foo>
<foo>two<foo> 
</s>
<s>
<bar>a</bar>
</s>
</s>
</foobar>

I have one xslt file that put sequence of elements in "s" tag otherwise output same xml file.

My xslt is-

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output indent="yes"/>

 <xsl:key name="kGroupLeader" match="*" 
         use="generate-id(self::*[name() != name(preceding-sibling::*[1])])" />

 <xsl:key name ="checkgroup" match="*" use ="self::*[name() = name(preceding-sibling::*[1])]" />

<xsl:template match="*[*]">
    <xsl:copy>

   <xsl:choose>
<xsl:when test="(key('checkgroup',*))">
    <s>
     <xsl:for-each select="*[key('kGroupLeader', generate-id())]">
          <s>
            <xsl:apply-templates select=". | following-sibling::*[
                  name() = name(current())
                  and generate-id(current()) = generate-id(
                    preceding-sibling::*[key('kGroupLeader', generate-id())][1]
                  )
                ]" />
          </s>
        </xsl:for-each>
   </s>
   </xsl:when>
    <xsl:otherwise>
  <xsl:copy-of select="."/>
    </xsl:otherwise>
  </xsl:choose>
    </xsl:copy>
</xsl:template>
</xsl:stylesheet>

It's working fine.But It uses a lot of memory and takes a lot of time to process large xml files. How can I improve performance to make my xslt faster?

  • Performance depends on the XSLT processor you are using. You need to tell us. And how large are the files, and how long does it take? – Michael Kay Apr 04 '14 at 11:34
  • @MichaelKay i am using xsltproc...to process 300 KB xml file it takes 40 seconds and 174 MB memory.which is a lot compared to using C language which takes less time and only 8 MB memory....i removed the checkgroup key and the memory usage was 12 MB..but i need the checkgroup key to see if there is any sequence in the xml.. –  Apr 04 '14 at 16:50
  • 1
    The following-sibling::*[preceding-sibling::*] combination is intrinsically quadratic in the number of siblings. I haven't worked out the logic of your transformation but there is almost certainly a more efficient approach, especially if you switch to XSLT 2.0. – Michael Kay Apr 05 '14 at 17:29

1 Answers1

1

Rather than using keys, an alternative approach might be to use tail-recursive templates to implement a kind of "while loop":

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

  <xsl:strip-space elements="*"/>
  <xsl:output indent="yes" />

  <!-- normal case - identity transform -->
  <xsl:template match="@*|node()">
    <xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy>
  </xsl:template>

  <!-- for elements that contain adjacent child elements with the same name -->
  <xsl:template match="*[*[name() = name(preceding-sibling::*[1])]]">
    <xsl:copy>
      <!-- wrap contents in an s -->
      <s>
        <!-- wrap each "run" of consecutive elements with the same name in
             another s.  We do this by applying "seq" mode templates to
             the _first_ element in each run. -->
        <xsl:for-each select="*[name() != name(preceding-sibling::*[1])]">
          <s><xsl:apply-templates select="." mode="seq" /></s>
        </xsl:for-each>
      </s>
    </xsl:copy>
  </xsl:template>

  <!-- tail recursion - process self with normal mode templates, then recurse
       with this template for next sibling if its name matches mine -->
  <xsl:template match="*" mode="seq">
    <xsl:apply-templates select="." />
    <xsl:apply-templates mode="seq"
       select="following-sibling::*[1][name() = name(current())]" />
  </xsl:template>
</xsl:stylesheet>

The tail recursive seq mode template is effectively a loop saying keep processing elements (using the default mode templates) until you reach one with a different name.

Ian Roberts
  • 120,891
  • 16
  • 170
  • 183
  • thanks..there is a dramatic change in time and memory usage..its much faster...is it because i used two keys that made the process too slow?? –  Apr 05 '14 at 16:31
  • 2
    @Sam I'm honestly not sure what exactly in your version makes it so slow, but I can say that in my version I'm careful to make each `name` test against just one other element. Testing whether an element matches the `*[*[name() = name(...)]]` template will be at worst linear in the number of that node's children. – Ian Roberts Apr 05 '14 at 18:43