0

I have some XML that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <issue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <comment text="&lt;div class=&quot;wiki text&quot;&gt;&lt;h4&gt;Tom Fenech&lt;/h4&gt;Here is a comment&lt;/div&gt;&#10;"/>
    </issue>
</root>

As you can see, the text attribute in the comment node contains escaped HTML. I would like to get the contents of the attribute as XHTML, which I currently do this inside a template using:

<xsl:value-of select="@text" disable-output-escaping="yes" />

That gets me the HTML in the final output:

<div class="wiki text"><h4>Tom Fenech</h4>Here is a comment</div>

But I want to be able to extract the contents of the <h4> tag to use elsewhere. In general, it would be nice to be able to manipulate the contents of this once it has been escaped.

How do I apply further templates to the output of the <xsl:value-of />?

I am currently using the PHP built-in XSLT processor, which supports XSLT version 1.0, although I would be willing to consider using an alternative processor if features from newer versions make this possible.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
  • 1
    Which XSLT processor do you use or can you use? Saxon 9.6 HE edition is an XSLT 3.0 processor that would allow you to use `` or perhaps better ``. See http://www.w3.org/TR/xpath-functions-30/#func-parse-xml. If you want to make use of `value-of` and `disable-output-escaping` then you always need two stylesheets, the first which outputs the attribute value using `disable-output-escaping` and the second which consumes the result of the first. – Martin Honnen Nov 19 '14 at 15:49
  • @Martin I'm using the XSLT processor that comes with PHP (which supports only version 1.0). I've updated the question. I understand that it is possible to use other XSLT processors (such as Saxon) with PHP, so I would still be interested in solutions that used features from more modern versions. – Tom Fenech Nov 19 '14 at 15:59

2 Answers2

1

You cannot apply templates to unparsed (escaped or CDATA) text. See some previous answers that may be relevant to you:

Parsing html with xslt

XSLT: Reading a param that's an xml document passed as a string

how to parse the xml inside CDATA of another xml using xslt?

Community
  • 1
  • 1
michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • Thanks for the links - I suspected that this may not be possible in XSLT but was hoping there might be a way round it. – Tom Fenech Nov 21 '14 at 10:26
1

Here's one way you could do it, by calling into a PHP function from XSLT:

function parseHTMLString($html)
{
    $doc = new DOMDocument();
    $doc->loadHTML($html);
    return $doc;
}

$xml = <<<EOB
<root>
    <issue xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <comment text="&lt;div class=&quot;wiki text&quot;&gt;&lt;h4&gt;Tom Fenech&lt;/h4&gt;Here is a comment&lt;/div&gt;&#10;"/>
    </issue>
</root>
EOB;

$xsl = <<<EOB
<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     xmlns:php="http://php.net/xsl"
     xsl:extension-element-prefixes="php">
<xsl:output method="html" encoding="utf-8" indent="yes"/>
 <xsl:template match="comment">
   <xsl:apply-templates select="php:functionString('parseHTMLString', @text)//div/h4"/>
 </xsl:template>

 <xsl:template match="div/h4">
   <h2><xsl:apply-templates/></h2>
 </xsl:template>
</xsl:stylesheet>
EOB;

$xmldoc = new DOMDocument();
$xmldoc->loadXML($xml);

$xsldoc = new DOMDocument();
$xsldoc->loadXML($xsl);

$proc = new XSLTProcessor();
$proc->registerPHPFunctions('parseHTMLString');
$proc->importStyleSheet($xsldoc);
echo $proc->transformToXML($xmldoc);
Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Impressive, thanks! There were a couple of changes to make it work - I had to add `xsl:extension-element-prefixes="php"` to the opening tag and change the syntax of the function call. – Tom Fenech Nov 19 '14 at 16:41