I have an XML document that contains a "body" element which contains xhtml. I'm trying to process that html in order to remove some non-standard tags. No namespaces are used in the source xml document.
The XML looks like this:
<article>
<body>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3 <fig></fig></p>
</body>
</article>
The XSLT looks like this:
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p">
<![CDATA[<div>HIT A P</div>]]>
<xsl:apply-templates mode="copy" select="@*|node()"/>
</xsl:template>
</xsl:stylesheet>
The output is this - and I don't get why it's only finding the first p tag:
<div>HIT A P</div>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3 <fig></fig></p>
Any idea why the p template only gets fired the first time rather than for all 3 paragraphs??
I'm also trying to figure out why adding this isn't causing the "fig" elements to be removed:
<xsl:template match="fig" />
Thanks for taking the time to help me out.
UPDATE: Thank you so much for the reply. I was trying to oversimplify the issue. What I'm really doing is two XSLT processes - one to get the data organized into a standard format and a 2nd XSLT process that looks at the HTML within the body and copies everything except certain non-standard tags.
I think the problem I'm having is that after the first XSLT process, the HTML within the body is htmlencoded, and it seems that the 2nd XSLT process isn't able to transform the HTML. Here's a better example of what is really happening:
This is the new XML (which is the result of an earlier xslt transformation - and as a result the text is encoded):
<document>
<article>
<title>SAMPLE TITLE</title>
<bodytext>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3</p>
<p>
Paragraph 4 - contains non-standard fig tag
<fig>
<graphic href="testgraphic.jpg"/>
</fig>
</p>
</bodytext>
</article>
</document>
Here is the new XSLT:
<xsl:output method="html" encoding="utf-8" indent="yes"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="p">
<![CDATA[<div>HIT A P</div>]]>
<xsl:apply-templates mode="copy" select="@*|node()"/>
</xsl:template>
<xsl:template match="bodytext">
<![CDATA[<div>HELLO FROM BODYTEXT</div>]]>
<xsl:element name="bodytext">
<xsl:apply-templates />
</xsl:element>
</xsl:template>
<!-- THIS APPEARS TO NEVER GET HIT -->
<xsl:template match="fig" />
</xsl:stylesheet>
When I run that, I get the following:
<document>
<article>
<title>SAMPLE TITLE</title>
<div>HELLO FROM BODYTEXT</div><bodytext>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3</p>
<p>
Paragraph 4 - contains non-standard fig tag
<fig>
<graphic href="testgraphic.jpg"/>
</fig>
</p>
</bodytext>
</article>
</document>
In this example, it isn't able to process each paragraph and remove the fig. However, if the XML isn't htmlencoded, it works. Here's the working XML:
<document>
<article>
<title>SAMPLE TITLE</title>
<bodytext>
<p>Paragraph 1</p>
<p>Paragraph 2</p>
<p>Paragraph 3 <fig></fig></p>
</bodytext>
</article>
</document>
And this is the output:
<document>
<article>
<title>SAMPLE TITLE</title>
<div>HELLO FROM BODYTEXT</div><bodytext>
<div>HIT A P</div>Paragraph 1
<div>HIT A P</div>Paragraph 2
<div>HIT A P</div>Paragraph 3
</bodytext>
</article>
</document>
Do you know how I can do that 2nd process when the incoming data is htmlencoded? Thanks again.