3

I have a xml like this,

<doc>
    <para>texttext<page>1</page>texttext<page>1</page>texttext</para>
    <para>texttext<page>1</page><page>2</page>texttext</para>
    <para>texttext<page>1</page><page>2</page><page>3</page>texttext<page>4</page><page>5</page><page>6</page>texttext</para>
    <para>texttext<page>1</page><page>2</page><page>3</page><page>4</page>texttext</para>
</doc>

I need to transform <page> nodes to <link> using xsl transform and following rules need to be considered,

  • if only one <page> node appear (not followed any page node) it just transform to <link>
  • if two <page> node placed successively (scenario 2 from above example) ',' has to added between output <link> nodes
  • if 3 or more <page> nodes placed successively (scenario 3 and 4 from above example), just adds first and last content of page node separated by '-'

So, output should be like this,

<doc>
    <para>texttext<link>1</link>texttext<link>1</link>texttext</para>
    <para>texttext<link>1</link>,<link>2</link>texttext</para>
    <para>texttext<link>1</link>-<link>3</link>texttext<link>4</link>-<link>6</link>texttext</para>
    <para>texttext<link>1</link>-<link>4</link>texttext</para>
</doc>

I wrote following xsl for do this task,

<xsl:template match="node()|@*">
        <xsl:copy>
            <xsl:apply-templates select="node()|@*"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="page">
        <link>
            <xsl:apply-templates/>
        </link>
    </xsl:template>

    <xsl:template match="page[following-sibling::node()[1][self::page]]">
        <link>
            <xsl:apply-templates/>
        </link>
        <xsl:text>,</xsl:text>
        <link>
            <xsl:apply-templates select="following-sibling::*[1]"/>
        </link>
    </xsl:template>

    <xsl:template match="page[following-sibling::node()[1][self::page]][following-sibling::node()[2][self::page]]">
        <link>
            <xsl:apply-templates/>
        </link>
        <xsl:text>-</xsl:text>
        <link>
            <xsl:apply-templates select="following-sibling::*[2]"/>
        </link>
    </xsl:template>

but this method is not woking as, it adds ',' when there are 3 successive <page> nodes appear and if there are more <page> nodes appear successively this method is not efficient.

Can anyone suggest a good method in xslt to analyze following siblings form xslt and do this task..

sanjay
  • 1,020
  • 1
  • 16
  • 38

1 Answers1

2
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="xml" encoding="UTF-8" indent="yes" />

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <xsl:template match="page[not(preceding-sibling::node()[1][self::page])]">
        <xsl:variable name="pages" select="following-sibling::page[
            preceding-sibling::node()[1][self::page]
            and generate-id(current()) = generate-id(preceding-sibling::page[
                not(preceding-sibling::node()[1][self::page])
            ][1])
        ]" />
        <xsl:apply-templates select="." mode="link" />
        <xsl:if test="count($pages) = 1">,</xsl:if>
        <xsl:if test="count($pages) &gt; 1">-</xsl:if>
        <xsl:apply-templates select="$pages[last()]" mode="link" />
    </xsl:template>
    <xsl:template match="page" />

    <xsl:template match="page" mode="link">
        <link>
            <xsl:apply-templates select="@*|node()"/>
        </link>
    </xsl:template>
</xsl:transform>

result

<doc>
    <para>texttext<link>1</link>texttext<link>1</link>texttext</para>
    <para>texttext<link>1</link>,<link>2</link>texttext</para>
    <para>texttext<link>1</link>-<link>3</link>texttext<link>4</link>-<link>6</link>texttext</para>
    <para>texttext<link>1</link>-<link>4</link>texttext</para>
</doc>

Here,

<xsl:template match="page[not(preceding-sibling::node()[1][self::page])]">

matches any <page> that begins a "range" of consecutive pages.

Selecting the remaining pages of a consecutive range is a bit tricky but can be done like this:

  • of all the following sibling pages, select those that
    • are themselves immediately preceded by a <page> (i.e. "part of a range") and
    • the closest preceding <page> that itself is not directly preceded by another <page> (i.e. "the closest <page> that starts a range") is identical to the current node.

Given that we only process <page> nodes that begin a range in this template, this amounts to "is part of the current range".

In XPath terms, as shown above:

following-sibling::page[
    preceding-sibling::node()[1][self::page]
    and generate-id(current()) = generate-id(preceding-sibling::page[
        not(preceding-sibling::node()[1][self::page])
    ][1])
]
Tomalak
  • 332,285
  • 67
  • 532
  • 628