1

I am stuck with an XML to XML transformation using XSLT 2.0 where I need to transform this:

<p>some mixed content <x h="">START:attr="value"</x> more mixed content <x h="">END</x> other mixed content</p>

To this:

<p>some mixed content <ph attr="value"> more mixed content </ph> other mixed content</p>

So basically I'd like to replace <x h="">START:attr="value"</x> with <ph attr="value">

and <x h="">END</x> with </ph> and process the rest as usual.

Does anyone know if that's possible?

My main issue is that I cannot figure out how to find the element with value END and then tell the XSLT processor (I use saxon) to process the content between the first occurence of and the second occurence of and finally write the end element . I am familiar with how to create an element (including attributes).

I have a specific template to match the start element START:attr="value". Since the XML document I process contains many other elements I'd prefer a recursive solution, so continue the processing of the found content between START and END by using other existing templates.

Sample XML (note that I don't know in advance if the parent will be a p element)

<p> my sample text <b>mixed</b> more
  <x h="">START:attr="value"</x>
  This is mixed content <i>REALLY</i>, process it normally
  <x h="">END</x>
</p>

My Stylesheet

<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="2.0">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="x[@h][starts-with(., 'START:')]">
    <ph>

       <xsl:for-each-group select="../*" group-starting-with="x[@h][. = 'START:']">
            <xsl:for-each-group select="current-group()" group-ending-with="x[@h][. = 'END']">

               <xsl:apply-templates select="@*|node()|text()"/>

            </xsl:for-each-group>
       </xsl:for-each-group>    
    </ph>
</xsl:template>

<xsl:template match="x[@h][starts-with(., 'END')]"/>

<xsl:template match="node()|@*">
    <xsl:copy copy-namespaces="no">
        <xsl:apply-templates select="node()|@*" /> 
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

Result

<?xml version="1.0" encoding="UTF-8"?>
<p> my sample text <b>mixed</b> more
  <ph>mixed</ph>
  This is mixed content <i>REALLY</i>, process it normally

</p>

I cannot figure out how to put the complete content between START and END within the tags. Any ideas?

  • See the examples in https://stackoverflow.com/tags/xslt-grouping/info on for-each-group group-starting-with/group-ending-with or in your favourite XSLT text book. – Martin Honnen Oct 02 '19 at 11:14
  • Thanks a lot - I like the "XSLT Fiddle" examples, good to play around with. The issue I see with `for-each-group` is that I cannot use `group-starting-with` and `group-ending-with` at the same time. I got an error messag of saxon when I tried. – deepbluesea70 Oct 02 '19 at 15:18
  • No, you don't use them on the same `for-each-group` but you can easily use a `for-each-group select="node()" group-starting-with="x[@h][starts-with(., 'START:')]"` and then inside of it nest ` for-each-group select="current-group()" group-ending-with="x[@h][. = 'END']"`. As with most group-starting-with/group-ending-with inside you need a a boolean check to distinguish whether you have a matching group or items not belonging to a matching group but that is rather straight-forward. – Martin Honnen Oct 02 '19 at 15:28
  • If you want to solve that problem with a singe construct you would need XQuery's `tumbling window .. start $s when ... end $e when ...` which is a bit more consise for that kind of check than the XSLT need to nest to grouping instructions. – Martin Honnen Oct 02 '19 at 15:29

1 Answers1

0

I would match on the parent containing those markers and use a nested for-each-group, of course all based on the identity transformation template as the base processing:

  <xsl:template match="p[x[@h][starts-with(., 'START:')]]">
      <xsl:copy>
          <xsl:apply-templates select="@*"/>
          <xsl:for-each-group select="node()" group-starting-with="x[@h][starts-with(., 'START:')]">
              <xsl:choose>
                  <xsl:when test="self::x[@h][starts-with(., 'START:')]">
                      <xsl:variable name="value" select="replace(., '(START:attr=&quot;)([^&quot;]*)&quot;', '$2')"/>
                      <xsl:for-each-group select="current-group()[position() gt 1]" group-ending-with="x[@h][. = 'END']">
                          <xsl:choose>
                              <xsl:when test="current-group()[last()][self::x[@h][. = 'END']]">
                                  <ph attr="{$value}">
                                      <xsl:apply-templates select="current-group()[position() ne last()]"/>
                                  </ph>
                              </xsl:when>
                              <xsl:otherwise>
                                  <xsl:apply-templates select="current-group()"/>
                              </xsl:otherwise>
                          </xsl:choose>
                      </xsl:for-each-group>
                  </xsl:when>
                  <xsl:otherwise>
                      <xsl:apply-templates select="current-group()"/>
                  </xsl:otherwise>
              </xsl:choose>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>

XSLT 3 example at https://xsltfiddle.liberty-development.net/pPJ8LV4, for XSLT 2 you need to replace the used xsl:mode declaration with <xsl:template match="@* | node()"><xsl:copy><xsl:apply-templates select="@* | node()"/></xsl:copy></xsl:template>.

As Saxon also supports XQuery using tumbling window where you can check both the start and the end condition together might be a bit more concise (although in XQuery you have to do extra work to make sure you pass the stuff not being wrapped through as the windowing normally filters out items for which the conditions not hold):

p ! <p>
{
    for tumbling window $group in node()
    start $s 
      when $s[self::x[@h][starts-with(., 'START:')]] or true()
    end $e 
      when $e[self::x[@h][. = 'END']] and $s[self::x[@h][starts-with(., 'START:')]] or not($s[self::x[@h][starts-with(., 'START:')]])
    return 
        if ($s[self::x[@h][starts-with(., 'START:')]])
        then
            <ph value="{replace($group[1], '(START:attr=&quot;)([^&quot;]*)&quot;', '$2')}">
            {
                tail($group)[not(position() = last())]
            }
            </ph>
        else $group
}
</p>

https://xqueryfiddle.liberty-development.net/948Fn5s/2

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Thanks for the solution! I could include it into my bigger XSLT, adjust it to my needs, and now it works! I hadn't heard about xquery at all yet. Is that built-in into the saxon engine? – deepbluesea70 Oct 07 '19 at 13:51
  • @deepbluesea70, yes, Saxon 9 supports XSLT as well as XQuery. – Martin Honnen Oct 07 '19 at 13:53