2

I am doing a very simple xslt to convert a html page to a xml file.

But it appears to me that the starting point is not that straightforward to me.My first goal is to convert a <html> tag into a <topic> tag.

I did the following xslt:

 <xsl:template match="@*|node()">
   <xsl:copy>
    <xsl:apply-templates select="@*|node()"/> 
  </xsl:copy>  
 </xsl:template>

 <xsl:template match="/">
   <xsl:apply-templates/>
 </xsl:template>

 <xsl:template match="html">
  <topic>
    <xsl:text> Conversion Test</xsl:text>
  </topic>
 </xsl:template>

However, now after I run this xslt, the result xml is purely of the same content of the original html page, it seems that the third template match that I wrote (to match the <html> tag) never gets hit.

The source html looks like:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
 <html xmlns="http://www.w3.org/1999/xhtml">
   <head>..</head>
   <body>...</body>
 </html>

Could experts help me a little here?

Kevin
  • 6,711
  • 16
  • 60
  • 107

3 Answers3

4

XSLT 1.0:

Try adding xmlns:x="http://www.w3.org/1999/xhtml" to your xsl:stylesheet and changing your match to match="x:html". (Note: you don't have to use "x"; you can choose anything you want.)

XSLT 2.0:

Either use the above method or replace the namespace prefix in your match(es) to "*" (match="*:html"). You could also add xpath-default-namespace="http://www.w3.org/1999/xhtml" to the xsl:stylesheet.

Daniel Haley
  • 51,389
  • 6
  • 69
  • 95
  • thank you, it worked! Yes, the html is actually xhtml and I am using XSLT1.0, after put in your suggested namespace, worked great:) – Kevin Oct 27 '11 at 18:38
  • I updated the title to reflect the nature of the source document too. – Kevin Oct 27 '11 at 18:39
  • 1
    @Kevin - You're very welcome. Also, if you don't want the namespace in your XML output, add `exclude-result-prefixes="#all"` to `xsl:stylesheet`. (Note: you can replace `#all` with `x` to exclude `x` specifically. – Daniel Haley Oct 27 '11 at 18:45
0

You may want to try to remove the first template or make it more specific than matching every node with node().

Ludovic Kuty
  • 4,868
  • 3
  • 28
  • 42
  • Are you saying remove the identity transform? – Daniel Haley Oct 27 '11 at 18:29
  • @lkuty, I did try removing the first template. Now the resulting xml is simply a big node of text without any markup. It contains all the text from the original html page. – Kevin Oct 27 '11 at 18:31
  • I was wrong. I thought the first rule could be chosen insted of the third but in fact the default priority for a match pattern with an element is greater than `node()` and thus it could not be the problem. I just didn't think about NS. – Ludovic Kuty Oct 28 '11 at 07:12
0

The purpose of XSLT is to transform XML documents into other XML documents. HTML is not a XML document. While XHTML is XML, it is actually HTML reformulated so I'm just not sure what you want to do is easy or possible with XSLT.

Rob
  • 14,746
  • 28
  • 47
  • 65