4

I am looking for a solution that will turn

<p>
<hi rend="bold">aa</hi>
<hi rend="bold">bb</hi>
<hi rend="bold">cc</hi>
Perhaps some text.
<hi rend="italic">dd</hi>
<hi rend="italic">ee</hi>
Some more text.
<hi rend="italic">ff</hi>
<hi rend="italic">gg</hi>
Foo.
</p>

into

<p>
<hi rend="bold">aabbcc</hi>
Perhaps some text.
<hi rend="italic">ddee</hi>
Perhaps some text.
<hi rend="italic">ffgg</hi>
Foo. 
</p>

but my solution should _not hardcode elements and the names of the attribute values (italic, bold). The XSLT should really concatenate ALL sibling elements that have the same name and the same attribute value. Everything else should be left untouched.

I have looked at the solutions that already exist out there but none of them seemed to satisfy all of my requirements.

If anybody has a handy XSLT stylesheet for this, I'd be much obliged.

Tench
  • 485
  • 3
  • 18

3 Answers3

6

This XSLT 2.0 style-sheet will merge adjacent elements with common rend attribute.

<xsl:stylesheet version="2.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes" />
<xsl:strip-space elements="*" />  

<xsl:template match="@*|node()">
  <xsl:copy>
    <xsl:apply-templates select="@*|node()" />
  </xsl:copy>
</xsl:template>

<xsl:template match="*[*/@rend]">
  <xsl:copy>
    <xsl:apply-templates select="@*" />
    <xsl:for-each-group select="node()" group-adjacent="
       if (self::*/@rend) then
           concat( namespace-uri(), '|', local-name(), '|', @rend)
         else
           ''">
      <xsl:choose>
        <xsl:when test="current-grouping-key()" >
          <xsl:for-each select="current-group()[1]">
            <xsl:copy>
              <xsl:apply-templates select="@* | current-group()/node()" />
            </xsl:copy>
          </xsl:for-each>
        </xsl:when>
        <xsl:otherwise>
         <xsl:apply-templates select="current-group()" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>

The advantages of this solution over Martin's are:

  • This merges over all parent elements, not just p elements.
  • Faster. Merging is accomplished over a single xsl:for-each instead of two nested xsl:for-each
  • The non-rend attributes of the head merge-able element are copied to the output.

Note also:

  • The test for pure white-space nodes, to be excluded for the purpose of determining "adjacent" elements with a common name and rend attribute value, is completely obviated by the xsl:strip-space instruction. Thus the xsl:for-each instruction if fairly simple and readable.
  • As an alternative to the group-adjacent attribute value, you could use instead ...

    <xsl:for-each-group select="node()" group-adjacent="
       string-join(for $x in self::*/@rend return
         concat( namespace-uri(), '|', local-name(), '|', @rend),'')">
    

    Use whichever form you personally find more readable.

Sean B. Durkin
  • 12,659
  • 1
  • 36
  • 65
  • That's great, Sean. Thanks a million. – Tench Oct 02 '12 at 11:04
  • Alas, I am new here and a vote-up requires a reputation of 15 points, which I still don't have. As soon as I get, I will make sure I up-vote your contribution. – Tench Oct 02 '12 at 12:13
1

Is the name of that attribute (e.g. rend) known? In that case I think you want

<xsl:template match="p">
  <xsl:copy>
    <xsl:for-each-group select="*" group-adjacent="concat(node-name(.), '|', @rend)">
      <xsl:element name="{name()}" namespace="{namespace-uri()}">
         <xsl:copy-of select="@rend"/>
         <xsl:apply-templates select="current-group()/node()"/>
      </xsl:element>
     </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

[edit] If there can be text node with content between the elements, as you have shown in the edit of your input, then you need to nest to groupings as in the sample

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0"
  xmlns:xs="http://www.w3.org/2001/XMLSchema"
  exclude-result-prefixes="xs">

<xsl:template match="p">
  <xsl:copy>
    <xsl:for-each-group select="node() except text()[not(normalize-space())]" group-adjacent="boolean(self::*)">
      <xsl:choose>
        <xsl:when test="current-grouping-key()">
          <xsl:for-each-group select="current-group()" group-by="concat(node-name(.), '|', @rend)">
            <xsl:element name="{name()}" namespace="{namespace-uri()}">
               <xsl:copy-of select="@rend"/>
               <xsl:apply-templates select="current-group()/node()"/>
            </xsl:element>
          </xsl:for-each-group>
        </xsl:when>
        <xsl:otherwise>
          <xsl:apply-templates select="current-group()"/>
        </xsl:otherwise>
      </xsl:choose>
     </xsl:for-each-group>
  </xsl:copy>
</xsl:template>

</xsl:stylesheet>
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Thanks, Martin. This is almost it. But this code does not take into consideration any text which is outside the hi nodes, it only picks the element nodes inside p. I have made some changes in my example to show that. – Tench Sep 30 '12 at 17:30
1

In case a casual visitor should come along and wonder if there is an XSLT 1.0 solution for this problem, I offer the following. Note that I am not trying to diminish from Sean and Martin's correct answers; I am merely offering some flavor.

When this XSLT 1.0 solution:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:output omit-xml-declaration="no" indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key
     name="kFollowing" 
     match="hi" 
     use="concat(@rend, '+', generate-id(following-sibling::text()[1]))" />

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/*">
    <p>
      <xsl:apply-templates 
        select="
          hi[generate-id() = 
             generate-id(
           key('kFollowing', 
             concat(@rend, '+', generate-id(following-sibling::text()[1])))[1])]" />
    </p>
  </xsl:template>

  <xsl:template match="hi">
    <xsl:copy>
      <xsl:apply-templates 
        select="@*|key('kFollowing', 
          concat(@rend, '+', generate-id(following-sibling::text()[1])))/text()" />
    </xsl:copy>
    <xsl:apply-templates select="following-sibling::text()[1]" />
  </xsl:template>

</xsl:stylesheet>

...is applied to the OP's original XML:

<p>
<hi rend="bold">aa</hi>
<hi rend="bold">bb</hi>
<hi rend="bold">cc</hi>
Perhaps some text.
<hi rend="italic">dd</hi>
<hi rend="italic">ee</hi>
Some more text.
<hi rend="italic">ff</hi>
<hi rend="italic">gg</hi>
Foo.
</p>

...the desired result is produced:

<p>
<hi rend="bold">aabbcc</hi>
Perhaps some text.
<hi rend="italic">ddee</hi>
Perhaps some text.
<hi rend="italic">ffgg</hi>
Foo. 
</p>
ABach
  • 3,743
  • 5
  • 25
  • 33