2

I'm have the following flat XML-Structure

<div class="section-level-1">

  <!-- other elements -->

  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>

  <!-- other elements -->

  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <misc-element>...</misc-element>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>
</div>

The order of the these elements is always the same (para -> figure-caption-german -> figure-caption-english), however I can't exclude that it will be interrupted by other elements (here the misc-element).

I want to wrap these three elements inside a single element

<div class="section-level-1">

  <!-- other elements -->

  <div class="figure">
    <p class="para">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-german">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-english">
      <img src="..." alt="..." title="..." />
    </p>
  </div>

  <!-- other elements -->

  <div class="figure">
    <p class="para">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-german">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-english">
      <img src="..." alt="..." title="..." />
    </p>
  </div>
</div>

The interrupting element(s) don't need to be preserved and can be deleted.

What I have so far

<xsl:template match="/">
  <xsl:apply-templates />
</xsl:template>

<!-- Html Ninja Pattern -->

<xsl:template match="*">
  <xsl:element name="{name()}">
    <xsl:apply-templates select="* | @* | text()"/>
  </xsl:element>
</xsl:template>

<xsl:template match="body//@*">
  <xsl:attribute name="{name(.)}">
    <xsl:value-of select="."/>
  </xsl:attribute>
</xsl:template>

<!-- Modify certain elements -->

<xsl:template match="" priority="1">
  <!-- do something -->
</xsl:template>

As a basic pattern I draw on the "Html Ninja Technique" (http://getsymphony.com/learn/articles/view/html-ninja-technique/) since it allows me to tackle only those particular elements I need to transform while sending all other elements to the output tree unchanged. So far everything worked fine, but now I really seemed to hit a road block. I'm not even sure I can accomplish the desired task by relying on the "Html Ninja Technique".

Any help or indication would be highly appreciated.

Best regards and thank you, Matthias Einbrodt

4 Answers4

0

It's a little involved, but I think this should do it:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="*" name="Copy">
    <xsl:element name="{name()}">
      <xsl:apply-templates select="* | @* | text()"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="@*">
    <xsl:attribute name="{name(.)}">
      <xsl:value-of select="."/>
    </xsl:attribute>
  </xsl:template>

  <xsl:template match="div[starts-with(@class, 'section-level')]">
    <xsl:copy>
      <xsl:apply-templates select="@*" />
      <!-- Apply templates to paras and anything with no preceding sibling
           or with a figure-caption-english preceding sibling-->
      <xsl:apply-templates select="p[@class = 'para'] | 
                                 *[not(preceding-sibling::*) or
                                    preceding-sibling::*[1][self::p]
                                      [@class = 'figure-caption-english']
                                  ]"
                           mode="iter"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="p[@class = 'para']" mode="iter">
    <div class="figure">
      <xsl:call-template name="Copy" />
      <!-- Apply templates to the next english and german figure captions -->
      <xsl:apply-templates
        select="following-sibling::p[@class = 'figure-caption-german'][1] |
                following-sibling::p[@class = 'figure-caption-english'][1]" />
    </div>
  </xsl:template>

  <xsl:template match="*" mode="iter">
    <xsl:call-template name="Copy" />
    <xsl:apply-templates 
        select="following-sibling::*[1]
                      [not(self::p[@class = 'para'])]"
        mode="iter"/>
  </xsl:template>
</xsl:stylesheet>

When applied to this sample data:

<div class="section-level-1">

  <!-- other elements -->
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>
  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>

  <!-- other elements -->
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>

  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <misc-element>...</misc-element>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>
</div>

It produces:

<div class="section-level-1">
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>
  <div class="figure">
    <p class="para">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-german">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-english">
      <img src="..." alt="..." title="..." />
    </p>
  </div>
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>
  <div class="figure">
    <p class="para">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-german">
      <img src="..." alt="..." title="..." />
    </p>
    <p class="figure-caption-english">
      <img src="..." alt="..." title="..." />
    </p>
  </div>
  <div>hello</div>
  <div>hello</div>
  <div>hello</div>
</div>
JLRishe
  • 99,490
  • 19
  • 131
  • 169
  • Thanks a lot for your answer JLRishe. In particular the XPath expression to respectively select the first elements holding an german and english caption after an image context-node was very insightful and inspiring. I drew on that to implement a multi-pass solution to the problem (see my answer below). –  Mar 13 '13 at 09:54
0

Here's another approach. This one does involve iterating over the child elements of the div, but also makes use of an xsl:key to group the relevant p elements.

Firstly, define a key to group your 'figure-caption' elements by the first most preceding 'para' element:

<xsl:key name="para" 
     match="p[starts-with(@class, 'figure-caption')]" 
     use="generate-id(preceding-sibling::p[@class='para'][1])"/>

Then, you start off by matching the div element, and selecting the first element

<xsl:template match="div">
   <div>
      <xsl:apply-templates select="node()[1]" mode="iterate"/>
   </div>
</xsl:template>

The mode iterate is used to indicate the templates that will recursively match their following sibling. You would need firstly need a template to match the 'para' element, where you can use the key to group the relevant elements

<xsl:template match="p[@class='para']" mode="iterate">
   <div class="figure">
      <xsl:apply-templates select=".|key('para', generate-id())" mode="group"/>
   </div>

(The mode group here will be used to indicate that for the grouped elements the matching template will just output them, but not carry on processing at the next sibling. You could use xsl:copy-of here alternatively)

And within this template, you then carry on the iteration by selecting the node after the last element in the group

<xsl:apply-templates 
     select="key('para', generate-id())[last()]/following-sibling::node()[1]" mode="iterate"/>

Other elements within the iteration can then be matched with a more generic template to copy them, and continue at the next sibling

<xsl:template match="node()" mode="iterate">
   <xsl:call-template name="identity"/>
   <xsl:apply-templates select="following-sibling::node()[1]" mode="iterate"/>
</xsl:template>

identity here will call the identity template.

Here is the full XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="xml" indent="yes"/>
   <xsl:key name="para" match="p[starts-with(@class, 'figure-caption')]" use="generate-id(preceding-sibling::p[@class='para'][1])"/>

   <xsl:template match="div">
      <div>
         <xsl:copy-of select="@*"/>
         <xsl:apply-templates select="node()[1]" mode="iterate"/>
      </div>
   </xsl:template>

   <xsl:template match="p[@class='para']" mode="iterate">
      <div class="figure">
         <xsl:apply-templates select=".|key('para', generate-id())" mode="group"/>
      </div>
      <xsl:apply-templates select="key('para', generate-id())[last()]/following-sibling::node()[1]" mode="iterate"/>
   </xsl:template>

   <xsl:template match="node()" mode="group">
      <xsl:call-template name="identity"/>
   </xsl:template>

   <xsl:template match="node()" mode="iterate">
      <xsl:call-template name="identity"/>
      <xsl:apply-templates select="following-sibling::node()[1]" mode="iterate"/>
   </xsl:template>

   <xsl:template match="@*|node()" name="identity">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

When applied to your sample XML, the following is output

<div class="section-level-1">
   <!-- other elements -->
   <div class="figure">
      <p class="para">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-german">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-english">
         <img src="..." alt="..." title="..."/>
      </p>
   </div>
   <!-- other elements -->
   <div class="figure">
      <p class="para">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-german">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-english">
         <img src="..." alt="..." title="..."/>
      </p>
   </div>
</div>

One advantage of this approach is that you can throw other languages, other than english and german into the mix, and it should still work, and the order of the languages would not matter either. (Of course, you might want to ignore other languages, in which case it wouldn't work!)

Tim C
  • 70,053
  • 14
  • 74
  • 93
  • Hello Tim C. thank you for your answer. I'll only be able to look into it with more detail in the next days. However, I'm really intrigued by the fact that there isn't just one way of doing it in XSLT. –  Mar 13 '13 at 09:57
0

A simple XSLT 2.0 solution:

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:param name="pClasses" select=
 "'para', 'figure-caption-german', 'figure-caption-english'"/>

 <xsl:template match="node()|@*">
  <xsl:copy>
   <xsl:apply-templates select="node()|@*"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="/*">
  <xsl:copy>
   <xsl:apply-templates select="@*"/>

   <xsl:for-each-group select="p[@class=$pClasses]"
     group-starting-with="p[@class eq $pClasses[1]]">
     <div class="figure">
       <xsl:apply-templates select="current-group()"/>
     </div>
    </xsl:for-each-group>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<div class="section-level-1">

  <!-- other elements -->

  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>

  <!-- other elements -->

  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <misc-element>...</misc-element>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>
</div>

the wanted, correct result is produced:

<div class="section-level-1">
   <div class="figure">
      <p class="para">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-german">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-english">
         <img src="..." alt="..." title="..."/>
      </p>
   </div>
   <div class="figure">
      <p class="para">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-german">
         <img src="..." alt="..." title="..."/>
      </p>
      <p class="figure-caption-english">
         <img src="..." alt="..." title="..."/>
      </p>
   </div>
</div>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Hello Dimitre,thank you very much for your solution. Short and simple. In the next days I'll try it out and let you know how it went. –  Mar 13 '13 at 10:00
0

Based upon the solution of JLRishe I got inspired to implement a multi-pass solution to the problem using different template-modes.

Given the following flat XML-Structure

<div class="section-level-1">

  <!-- other elements -->

  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>

  <!-- other elements -->

  <p class="para">
    <img src="..." alt="..." title="..." />
  </p>
  <p class="figure-caption-german">
    <img src="..." alt="..." title="..." />
  </p>
  <misc-element>...</misc-element>
  <p class="figure-caption-english">
    <img src="..." alt="..." title="..." />
  </p>
</div>

I applied the following approach.

<xsl:template match="/">
  <xsl:variable name="pass0">
    <xsl:apply-templates mode="pass0" />
  </xsl:variable>

  <xsl:variable name="pass1">
    <xsl:for-each select="$pass0">
      <xsl:apply-templates mode="pass1" />
    </xsl:for-each>
  </xsl:variable>

  <xsl:copy-of select="$pass1" />        
</xsl:template>

<!--###############
    ### Pass 0 #### 
    ###############-->

<xsl:template match="*" mode="pass0">
  <xsl:element name="{name()}">
    <xsl:apply-templates select="* | @* | text()" mode="pass0"/>
  </xsl:element>
</xsl:template>

<xsl:template match="@*" mode="pass0">
  <xsl:attribute name="{name(.)}">
    <xsl:value-of select="."/>
  </xsl:attribute>
</xsl:template>

<!-- wraps figures and their associated captions within <div class="figure"> element -->
<xsl:template match="p[@class = 'para'][img]" mode="pass0" priority="1">
  <div class="figure">
    <xsl:copy-of select="./img" />
      <xsl:apply-templates 
        select="following-sibling::p[@class = 'figure-caption-german'][1] |
                following-sibling::p[@class = 'figure-caption-english'][1]" 
        mode="fig- captions-pass0" />
  </div>
</xsl:template>

<xsl:template match="*" mode="fig-captions-pass0" priority="1">
  <xsl:copy-of select="." />
</xsl:template>

<!--###############
    ### Pass 1 #### 
    ###############-->

<xsl:template match="*" mode="pass1">
  <xsl:element name="{name()}">
    <xsl:apply-templates select="* | @* | text()" mode="pass1"/>
  </xsl:element>
</xsl:template>

<xsl:template match="@*" mode="pass1">
  <xsl:attribute name="{name(.)}">
    <xsl:value-of select="."/>
  </xsl:attribute>
</xsl:template>

<!-- removes all elements with figure captions that don't reside within <div class="figure"> element and all other unnecessary elements -->
<xsl:template match="
  p[@class = 'figure-caption-german'][not(parent::div[@class = 'figure'])] |
  p[@class = 'figure-caption-english'][not(parent::div[@class = 'figure'])] |
  misc-element" 
  mode="pass1" priority="1" />

As a result I get the desired output

<div class="section-level-1">

  <p class="para">normal paragraph etc.</p>
  <p class="para">normal paragraph etc.</p>
  <p class="para">normal paragraph etc.</p>

  <div class="figure">
    <img src="..." alt="..." title="..."></img>
    <p class="figure-caption-german">
      figure caption in german
    </p>
    <p class="figure-caption-english">
      figure caption in english
    </p>
  </div>

  <p class="para">normal paragraph etc.</p>
  <p class="para">normal paragraph etc.</p>
  <p class="para">normal paragraph etc.</p>

  <div class="figure">
    <img src="..." alt="..." title="..."></img>
    <p class="figure-caption-german">
      figure caption in german
    </p>
    <p class="figure-caption-english">
      figure caption in english 
    </p>
  </div>
</div>