Parse HTML inside the CDATA text

Question

The data inside CDATA to be parsed as Html.

<?xml version="1.0" encoding="utf-8" ?>
<test>
  <test1>
    <![CDATA[ &lt;B&gt; Test Data1 &lt;/B&gt; ]]>
  </test1>

  <test2>
    <![CDATA[ &lt;B&gt; Test Data2 &lt;/B&gt; ]]>
  </test2>

  <test3>
    <![CDATA[ &lt;B&gt; Test Data3 &lt;/B&gt; ]]>
  </test3>
 </test>

From the Above input xml I need the output to be parsed as html.

But I am getting the output as

<B>Test Data1</B>
<B>Test Data2</B>
<B>Test Data3</B>

But the actual output I need the text to be in bold.

**Test Data1
Test Data2
Test Data3**

The input is coming from external system.We could not change the text inside CDATA

Rishe, I have a big xslt with some other scenario. This scenario is a part my xslt. — Blossom, Apr 18 '13 at 15:57
Does your input really look like your example? escaped html in cdata? Perhaps this could help: http://stackoverflow.com/questions/2067116/convert-an-xml-element-whose-content-is-inside-cdata — hr_117, Apr 18 '13 at 16:08
If I convert my transform as html the text to be in bold. Instead of that the output looks like above. And the story in the provided link is different than my issue. — Blossom, Apr 18 '13 at 18:03

score 1 · Answer 1 · answered Apr 18 '13 at 15:52

1

Parsing as HTML is only possible with an extension function (or with XSLT 2.0 and an HTML parser written in XSLT 2.0) but if you want to create HTML output and want to output the contents of the testX elements as HTML then you can do that with e.g.

<xsl:template match="test/*[starts-with(local-name(), 'test')]">
  <xsl:value-of select="." disable-output-escaping="yes"/>
</xsl:template>

Note however that disable-output-escaping is an optional serialization feature not supported by all XSLT processors in all use cases. For instance with client-side XSLT in Mozilla browsers it is not supported.

answered Apr 18 '13 at 15:52

Martin Honnen

160,499
6
90
110

Hi Honnen, If ran your Xslt the result is displaying as Test Data1 Test Data2 Test Data3 . But I need the out put in bold. – Blossom Apr 18 '13 at 18:15
Yes, sorry, I overlooked that your input data uses both CDATA section and entity references, that way my suggestion does not work. If you had e.g. `<![CDATA[Test data]]>`, then the disable-output-escaping would do. Can you tell us which XSLT processor you want to use to solve that? Are you developing in Visual Studio and do you want to write a .NET application using .NET's XslCompiledTransform? – Martin Honnen Apr 19 '13 at 09:43
And also tell us more about the input format, the last sample has `<![CDATA[ <p> Test Data3 </B> ]]>` with `p` being closed as `/B` which would further complicate things as neither SGML nor XML parsers could handle that without throwing an error. Is that a mistake in your posting? Or do you really need to handle input data with such errors? – Martin Honnen Apr 19 '13 at 09:46
sorry Honnen, I have edited my Input file.I have placed P instead of B mistakenly. And I am using xml editor in visual studio and i am not using any .net code. – Blossom Apr 19 '13 at 14:29

score 0 · Answer 2 · edited May 23 '17 at 12:28

If your have to stay with XSLT 1.0 you have to to run two transformation passes.

First one to copy your xml but remove the CDTA by generate the content with disable-output-escaping="yes" (See answer from @Martin Honnen)
In second path you can access the html part.

But this may be only possible if the html part follow the roles for well formatted xml (xhtml). If not perhaps a input switch as in xsltproc may help to work with html e.g.:

 --html: the input document is(are) an HTML file(s)

See also: Convert an xml element whose content is inside CDATA

Parse HTML inside the CDATA text

2 Answers2

Linked