3

This is my XML Document(Small Snippt).

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">

<w:body>

    <w:p> <!-- Current Node -->
        <w:pPr>
            <w:pStyle w:val="Heading1"/>
        </w:pPr>

        <w:r>
            <w:t>
                 Paragraph1
            </w:t>
        </w:r>
    </w:p>

    <w:tbl>
        <w:t>table info
        </w:t>
    </w:tbl>

    <w:p>
        <w:pPr>
        </w:pPr>
        <w:r>
            <w:t>
                 Paragraph2
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
        </w:pPr>
        <w:r>
            <w:t>
                 Paragraph3
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
            <w:pStyle w:val="Heading1"/>
        </w:pPr>

        <w:r>
            <w:t>
                 Paragraph4
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
        </w:pPr>
        <w:r>
            <w:t>
                 Paragraph5
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
            <w:pStyle w:val="Heading1"/>
        </w:pPr>

        <w:r>
            <w:t>
                 Paragraph6
            </w:t>
        </w:r>
    </w:p>

</w:body>
</w:document>

Here, I want to select the following-sibling of the first<w:p> using for-each statement until it encounters the next<w:p> that having its <w:pPr><w:pStyle w:val="Heading1"/></w:pPr>.

for example,for first<w:p> i want to select only next three following-siblings.(ie,paragraph2,paragraph3 and including table info).because,4th <w:p> having <w:pPr><w:pStyle w:val="Heading1"/></w:pPr>.

The same case for 4th<w:p> if it is current node. Then i want to select only 5th <w:p>.

I dont know how to specify this condition in for-each.So, Can u guide me to get this...

My Required output is like:

<Document>
   <Heading1>
        <paragraph>paragrap1</paragraph>
        <paragraph>table info</paragraph>
        <paragraph>paragrap2</paragraph>
        <paragraph>paragrap3</paragraph>
   </Heading1>
   <Heading1>
        <paragraph>paragrap4</paragraph>
        <paragraph>paragrap5</paragraph>
   </Heading1>
   <Heading1>
        <paragraph>paragrap6</paragraph>
   </Heading1>
</Document>
Saravanan
  • 11,372
  • 43
  • 143
  • 213

4 Answers4

2

Here, I want to select the following-sibling of the first <w:p> using for-each statement until it encounters the next <w:p> that having its <w:pPr><w:pStyle w:val="Heading1"/></w:pPr>.

This XSLT 2.0 transformation shows one way of doing it using the XPAth 2.0 operator >>:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:xs="http://www.w3.org/2001/XMLSchema"
 xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  exclude-result-prefixes="w xs">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="*/w:p[1]">

  <xsl:variable name="vNextWP" select=
  "following-sibling::w:p
    [w:pPr/w:pStyle/@w:val='Heading1']
     [1]
  "/>

  <xsl:copy-of select=
  "following-sibling::w:p[$vNextWP >> .]"/>
 </xsl:template>

 <xsl:template match="text()"/>
</xsl:stylesheet>

when applied on the provided XML document:

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">

    <w:body>

        <w:p> <!-- Current Node -->
            <w:pPr>
                <w:pStyle w:val="Heading1"/>
            </w:pPr>

            <w:r>
                <w:t>
                     Paragraph1
                </w:t>
            </w:r>
        </w:p>

        <w:p>
            <w:pPr>
            </w:pPr>
            <w:r>
                <w:t>
                     Paragraph2
                </w:t>
            </w:r>
        </w:p>

        <w:p>
            <w:pPr>
            </w:pPr>
            <w:r>
                <w:t>
                     Paragraph3
                </w:t>
            </w:r>
        </w:p>

        <w:p>
            <w:pPr>
                <w:pStyle w:val="Heading1"/>
            </w:pPr>

            <w:r>
                <w:t>
                     Paragraph4
                </w:t>
            </w:r>
        </w:p>

        <w:p>
            <w:pPr>
            </w:pPr>
            <w:r>
                <w:t>
                     Paragraph5
                </w:t>
            </w:r>
        </w:p>

        <w:p>
            <w:pPr>
                <w:pStyle w:val="Heading1"/>
            </w:pPr>

            <w:r>
                <w:t>
                     Paragraph6
                </w:t>
            </w:r>
        </w:p>

    </w:body>
</w:document>

exactly the wanted nodes are selected and copied to the output:

<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
            <w:pPr>
            </w:pPr>
            <w:r>
                <w:t>
                     Paragraph2
                </w:t>
            </w:r>
        </w:p>
<w:p xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
            <w:pPr>
            </w:pPr>
            <w:r>
                <w:t>
                     Paragraph3
                </w:t>
            </w:r>
        </w:p>

Update: The OP has clarified what wanted the result of the transformation is (grouping), so here is:

I. XSLT 1.0 solution:

 <xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
 exclude-result-prefixes="w">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kFollowing"
  match="w:p[not(w:pPr/w:pStyle/@w:val = 'Heading1')]
        |
         w:tbl"
  use="generate-id(preceding-sibling::w:p
           [w:pPr/w:pStyle/@w:val = 'Heading1'][1])
  "/>

 <xsl:template match="/*">
  <Document>
   <xsl:apply-templates/>
  </Document>
 </xsl:template>

 <xsl:template match=
 "w:p[w:pPr/w:pStyle/@w:val = 'Heading1']">

  <Heading1>
   <xsl:apply-templates mode="inGroup" select=
    ". | key('kFollowing', generate-id())"/>
   </Heading1>
 </xsl:template>

 <xsl:template match="*" mode="inGroup">
  <paragraph>
    <xsl:value-of select="normalize-space(.//w:t)"/>
  </paragraph>
 </xsl:template>

 <xsl:template match="w:body/*" priority="-1"/>
</xsl:stylesheet>

when this transformation is applied on the newly provided XML document:

<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">

<w:body>

    <w:p> <!-- Current Node -->
        <w:pPr>
            <w:pStyle w:val="Heading1"/>
        </w:pPr>

        <w:r>
            <w:t>
                 Paragraph1
            </w:t>
        </w:r>
    </w:p>

    <w:tbl>
        <w:t>table info
        </w:t>
    </w:tbl>

    <w:p>
        <w:pPr>
        </w:pPr>
        <w:r>
            <w:t>
                 Paragraph2
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
        </w:pPr>
        <w:r>
            <w:t>
                 Paragraph3
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
            <w:pStyle w:val="Heading1"/>
        </w:pPr>

        <w:r>
            <w:t>
                 Paragraph4
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
        </w:pPr>
        <w:r>
            <w:t>
                 Paragraph5
            </w:t>
        </w:r>
    </w:p>

    <w:p>
        <w:pPr>
            <w:pStyle w:val="Heading1"/>
        </w:pPr>

        <w:r>
            <w:t>
                 Paragraph6
            </w:t>
        </w:r>
    </w:p>

</w:body>
</w:document>

the wanted, correct result is produced:

<Document>
   <Heading1>
      <paragraph>Paragraph1</paragraph>
      <paragraph>table info</paragraph>
      <paragraph>Paragraph2</paragraph>
      <paragraph>Paragraph3</paragraph>
   </Heading1>
   <Heading1>
      <paragraph>Paragraph4</paragraph>
      <paragraph>Paragraph5</paragraph>
   </Heading1>
   <Heading1>
      <paragraph>Paragraph6</paragraph>
   </Heading1>
</Document>

II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
  exclude-result-prefixes="w"   >
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/*">
     <Document>
      <xsl:for-each-group
      select="*/*"
      group-starting-with="w:p[w:pPr/w:pStyle/@w:val = 'Heading1']">
       <Heading1>
         <xsl:for-each select="current-group()//w:t">
           <paragraph>
             <xsl:sequence select="normalize-space(.)"/>
           </paragraph>
         </xsl:for-each>
       </Heading1>
      </xsl:for-each-group>
     </Document>
 </xsl:template>
</xsl:stylesheet>

when this XSLT 2.0 transformation is applied on the same XML document (above), the same wanted, correct result is produced:

<Document>
   <Heading1>
      <paragraph>Paragraph1</paragraph>
      <paragraph>table info</paragraph>
      <paragraph>Paragraph2</paragraph>
      <paragraph>Paragraph3</paragraph>
   </Heading1>
   <Heading1>
      <paragraph>Paragraph4</paragraph>
      <paragraph>Paragraph5</paragraph>
   </Heading1>
   <Heading1>
      <paragraph>Paragraph6</paragraph>
   </Heading1>
</Document>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
2

This could be achieved (in XSLT 1.0) by means of a key to group the x:t elements by the first preceding w:pPr/w:pStyle element

<xsl:key 
   name="text" 
   match="w:t" 
   use="generate-id(preceding::w:pPr[w:pStyle][1]/w:pStyle)" />

Then, for any (or all) specific w:pStyle element, you can then get all the associated text elements, like so

<xsl:apply-templates select="key('text', generate-id())" />

So, the following XSLT....

<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" 
  exclude-result-prefixes="w">

   <xsl:output method="xml" indent="yes" />

   <xsl:key 
      name="text" 
      match="w:t" 
      use="generate-id(preceding::w:pPr[w:pStyle][1]/w:pStyle)" />

   <xsl:template match="/">
      <Document>
         <xsl:apply-templates select="//w:pPr/w:pStyle" />
      </Document>
   </xsl:template>

   <xsl:template match="w:pStyle">
      <xsl:element name="{@w:val}">
         <xsl:apply-templates select="key('text', generate-id())" />
      </xsl:element>
   </xsl:template>

   <xsl:template match="w:t">
      <paragraph><xsl:value-of select="normalize-space(.)" /></paragraph>
   </xsl:template>
</xsl:stylesheet>

When applied to your sample input XML document, the following is output:

<Document>
   <Heading1>
      <paragraph>Paragraph1</paragraph>
      <paragraph>table info</paragraph>
      <paragraph>Paragraph2</paragraph>
      <paragraph>Paragraph3</paragraph>
   </Heading1>
   <Heading1>
      <paragraph>Paragraph4</paragraph>
      <paragraph>Paragraph5</paragraph>
   </Heading1>
   <Heading1>
      <paragraph>Paragraph6</paragraph>
   </Heading1>
</Document>
Tim C
  • 70,053
  • 14
  • 74
  • 93
0

I suspect you want to do positional grouping, grouping the siblings and starting or ending a group whenever some condition is satisified. If that describes the problem, look at using xsl:for-each-group with the group-starting-with or group-ending-with attribute.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
0

There is an alternate way and to cheat slightly using CDATA, here is my version

1/ For the first and last position() element I use CDATA to open and close the <Heading1> tag,

2/ If you are in between the first and last and the w:p element has a Heading1, I am sure one Heading1 tag has to be closed and another opened.

3/ If nothing matches you must be a paragraph.

  <xsl:for-each select="//w:body/*">
        <xsl:choose>
            <!-- If you are the first one, create the heading element -->
            <xsl:when test="position() = 1">
                <xsl:text disable-output-escaping="yes">
                <![CDATA[
                <Heading1>
                ]]>
                </xsl:text>
                <paragraph>
                    <xsl:value-of select=".//w:t" />
                </paragraph>
            </xsl:when>
            <!-- If you are last close the element -->
            <xsl:when test="position() = last()">
                <paragraph>
                    <xsl:value-of select=".//w:t" />
                </paragraph>
                <xsl:text disable-output-escaping="yes">
                <![CDATA[
                </Heading1>
                ]]>
                </xsl:text>
            </xsl:when>
            <!-- If you are in-between first and last open and close -->
            <xsl:when test="w:pPr/w:pStyle/@w:val = 'Heading1'">
                <xsl:text disable-output-escaping="yes">
                <![CDATA[
                </Heading1><Heading1>
                ]]>
                </xsl:text>
                <!-- Nothing matches that means we need to pick up the paragraph -->
                <paragraph>
                    <xsl:value-of select=".//w:t" />
                </paragraph>
            </xsl:when>
            <xsl:otherwise>
                <!-- Nothing matches that means we need to pick up the paragraph -->
                <paragraph>
                    <xsl:value-of select=".//w:t" />
                </paragraph>
            </xsl:otherwise>
            </xsl:choose>
    </xsl:for-each>

Which gives the output (I have created multiple nodes in the XML for testing)

<Heading1>
    <paragraph> Paragraph1 </paragraph>
    <paragraph>table info </paragraph>
    <paragraph> Paragraph2 </paragraph>
    <paragraph> Paragraph3 </paragraph>

</Heading1>
<Heading1>
    <paragraph> Paragraph1 </paragraph>
    <paragraph>table info </paragraph>
    <paragraph> Paragraph2 </paragraph>
    <paragraph> Paragraph3 </paragraph>
</Heading1>
First Zero
  • 21,586
  • 6
  • 46
  • 45