0

I've been having a bit of trouble with an HTML file that I'm trying to translate. Basically, the relevant part of the source structure as it currently stands is this:

<h2 />
<h3 />
<table />
<table />
<h3 />
<table />
<table />
<h3 />
<table />
<h3 />
<h3 />
<table />
<table />
<h2 />
<h3 />
...

and so on. Each of the contents of these are being translated in different ways, but the problem I'm currently having is in grouping them correctly. Essentially, I want it to end up like the following:

<category>
    <h2 />
    <container>
        <h3 />
        <table />
        <table />
    </container>
    <container>
        <h3 />
        <table />
        <table />
    </container>
    <container>
        <h3 />
        <table />
    </container>
    <container>
        <h3 />
    </container>
    <container>    
        <h3 />
        <table />
        <table />
    </container>
</category>
<category>
    <h2 />
    <container>
        <h3 />
        ...

to achieve this, I've been using the following code:

<xsl:for-each-group select="node()"group-starting-with="xh:h2">
    <category>
        <xsl:apply-templates select="xh:h2"/>
        <xsl:for-each-group select="current-group()" 
                    group-starting-with="xh:h3">
            <container>
                <xsl:apply-templates select="current-group()[node()]"/>
            </container>
        </xsl:for-each-group>
    </category>
</xsl:for-each-group>

However, the output I get from this is as follows:

<category>
    <h2 />
    <container>
        <h3 />
        <table />
        <table />
        <h3 />
        <table />
        <table />
        <h3 />
        <table />
        <h3 />   
        <h3 />
        <table />
        <table />
    </container>
</category>
<category>
    <h2 />
    <container>
        <h3 />
        ...

The first for-loop function is working as expected, however the second does not appear to be. If I use <xsl:copy-of> to output the first element in the <current-group> in the second for-loop, it shows the <h2> element, where that element should not even be in the group.

If anyone can point out where I'm going wrong, or offer a better solution, it would be greatly appreciated.

Dan McElroy
  • 426
  • 4
  • 19

2 Answers2

0

I think you've simplified the problem and in doing so have introduced some red herrings.

The xsl:apply-templates select="h2" surely does nothing, because none of the nodes selected in the outer grouping has an h2 child.

In every group selected by the outer for-each-group, except the first, the first node in the group will be an h2 element, by definition. Your inner for-each-group will partition the sequence of nodes starting with an h2 into: first, a group that starts with the h2 (because every node becomes part of some group), and then a sequence of groups each of which starts with an h3. You need to split out the first (non-h3) group and treat it differently, because you don't want to generate a container element in this case. So you need an xsl:choose in the inner for-each-group, typically with the condition xsl:when test="self::h2" to detect that you're processing the special first group.

Having said all that I can't see why you aren't getting a container element for each h3 element. I think this must be caused by something that you haven't shown us (perhaps a namespace issue?)

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • Thanks for the help! The check for an

    element certainly solved some weird issues I was getting elsewhere, I had no idea about that first group. As you suggest, I have simplified the problem some. As this is dealing with an XHTML representation of a Microsoft Word document, the elements are quite inconsistent, and h2, in reality, is actually found by the condition: `node()[name()='h1' or name()='h2' or @class='Heading2NB' or @class='Heading2NoBreak' or @class='Heading2PageBreak']` and h3 is found by `node()[name()='h3' or @class='Heading3NB' or @class='Heading3NoBreak']`. Thanks again.

    – Dan McElroy Jul 04 '13 at 08:14
  • Ran out of space. As far as namespaces are concerned, I have all of the XHTML elements under `xmlns:xh="http://www.w3.org/1999/xhtml"` and in everywhere that I'm not checking against name(), the elements are suffixed with `xh:`. I have a feeling that name() should not work without the namespace, but it has so far in other operations on the XHTML file. If you think it would be relevant, I could add the conditions in the above comment to the question, but I had assumed that it would only serve to clutter up an already lengthy question. – Dan McElroy Jul 04 '13 at 08:28
0

I think you want to change

<xsl:for-each-group select="node()" group-starting-with="xh:h2">
    <category>
        <xsl:apply-templates select="xh:h2"/>
        <xsl:for-each-group select="current-group()" 
                    group-starting-with="xh:h3">

to

<xsl:for-each-group select="*" group-starting-with="xh:h2">
    <category>
        <xsl:apply-templates select="."/>
        <xsl:for-each-group select="current-group() except ." 
                    group-starting-with="xh:h3">

That way the inner for-each-group processes the h3 and table elements but no the h2 element starting the outer group.

If you need more help then consider to post small but complete samples with namespaces present allowing us to reproduce the problem with the undesired output.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Success! Thanks very much. Not sure how the omission of this code caused such spectacular failure, but the problem is solved. Also, it's too few characters to edit in, but did you mean for the 4th line of your solution to end on a quote? – Dan McElroy Jul 04 '13 at 12:39
  • Sorry about the missing quote, I copied from your question and simply edited what I wanted to correct, not noticing that a closing quote was missing on the `select` attribute value. – Martin Honnen Jul 04 '13 at 13:13