8

As part of a Java 6 application, I want to find all namespace declarations in an XML document, including any duplicates.

Edit: Per Martin's request, here's the Java code I am using:

XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xPath = xPathFactory.newXPath();
XPathExpression xPathExpression = xPathExpression = xPath.compile("//namespace::*"); 
NodeList nodeList = (NodeList) xPathExpression.evaluate(xmlDomDocument, XPathConstants.NODESET);

Suppose I have this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
    <ele:one>a</ele:one>
    <two att:c="d">e</two>
    <three>txt:f</three>
</root>

To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:

//namespace::*

It finds 4 namespace declarations, which is what I expect (and desire):

/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace

But if I change to using xPath 2.0, then I get 16 namespace declarations (each of the previous declarations 4 times), which is not what I expect (or desire):

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com
/root[1]/@xmlns:txt - textnode.com

This same difference is seen even when I use the non-abbreviated version of the xPath statement:

/descendant-or-self::node()/namespace::*

And it is seen across a variety of XML parsers (LIBXML, MSXML.NET, Saxon) as tested in oXygen. (Edit: As I mention later in the comments, this statement is not true. Though I thought I was testing a variety of XML parsers, I really wasn't.)

Question #1: Why the difference from xPath 1.0 to xPath 2.0?

Question #2: Is it possible/reasonable to get desired results using xPath 2.0?

Hint: Using the distinct-values() function in xPath 2.0 will not return the desired results, as I want all namespace declarations, even if the same namespace is declared twice. For example, consider this XML document:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <bar:one xmlns:bar="http://www.bar.com">alpha</bar:one>
    <bar:two xmlns:bar="http://www.bar.com">bravo</bar:two>
</root>

The desired result is:

/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace
/root[1]/bar:one[1]/@xmlns:bar - http://www.bar.com
/root[1]/bar:two[1]/@xmlns:bar - http://www.bar.com
james.garriss
  • 12,959
  • 7
  • 83
  • 96
  • James, please show us the code finding the namespace "declarations". In my understanding the XPath `//namespace::*` finds all namespace nodes which is different from namespace declaration as namespace nodes exist per element node and are not shared between nodes. So with an XML document having four element nodes where there are three namespace declarations on the root element the path should find four namespace nodes for each of the four elements. That should be the same between XPath 1.0 and 2.0 as far as I can tell. Also a notation like `/root[1]/@xmlns:txt` is rather misleading. – Martin Honnen Apr 18 '12 at 13:04
  • The /root[1]/@xmlns:txt notation comes from oXygen. That's their representation of the nodes in the nodelist, which is fine. – james.garriss Apr 18 '12 at 13:47
  • Java code added above. Pretty standard stuff. Thanks for explanation. – james.garriss Apr 18 '12 at 13:52
  • 1
    I think one problem is that the Java API you use works on the DOM node model or rather maps the XPath/XSLT data model to the DOM model. The DOM model has only attribute nodes, some of which are namespace declaration attributes. The XSLT/XPath model has attribute nodes and has namespace nodes and namespace declarations are not attribute nodes in that model so with e.g. `` with the the `foo` element has no attribute nodes in the XPath/XSLT data model but has two in scope namespace nodes (the one in the markup and the built-in for the xml namespace). – Martin Honnen Apr 18 '12 at 14:14
  • Continuing my comment: The problem is that you select some namespace nodes with XPath `//namespace::*` but then use an API presenting the result as DOM nodes. That mapping is probably implementation dependant. There are other known problems when mapping XPath to DOM e.g. with `<![CDATA[text 1]]>text2` it is implementation dependant what `/foo/text()[1]` selects when mapping to DOM as in DOM the `foo` element has two child nodes, a CDATA section node and a text node while the XPath model has only one text node. – Martin Honnen Apr 18 '12 at 14:18
  • James, I had completely forgotten that you could be interested in an XPath 2.0 solution. I have updated my answer with an XPath 2.0 expression that selects all "distinct" namespace nodes in an XML document and produces their readable representations. – Dimitre Novatchev May 03 '12 at 12:27

4 Answers4

8

I think this will get all namespaces, without any duplicates:

for $i in 1 to count(//namespace::*) return 
if (empty(index-of((//namespace::*)[position() = (1 to ($i - 1))][name() = name((//namespace::*)[$i])], (//namespace::*)[$i]))) 
then (//namespace::*)[$i] 
else ()
Roger Costello
  • 3,007
  • 1
  • 22
  • 43
  • There it is! This xPath 2.0 will find all namespace declarations, and it works on both of the examples I gave in my OP. The elegance of this approach is that is processes the namespaces as sequences. Well done, @Roger. – james.garriss May 03 '12 at 11:53
4

To find all namespace declarations, I applied this xPath statement to the XML document using xPath 1.0:

//namespace::* It finds 4 namespace declarations, which is what I expect (and desire):

/root[1]/@xmlns:att - attribute.com
/root[1]/@xmlns:ele - element.com 
/root[1]/@xmlns:txt - textnode.com 
/root[1]/@xmlns:xml - http://www.w3.org/XML/1998/namespace

You are using a non-compliant (buggy) XPath 1.0 implementation.

I get different and correct results with all XSLT 1.0 processors I have. This transformation (just evaluating the XPath expression and printing one line for each selected namespace node):

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema">
    <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="/">
     <xsl:for-each select="//namespace::*">
       <xsl:value-of select="concat(name(), ': ', ., '&#xA;')"/>
     </xsl:for-each>
 </xsl:template>
</xsl:stylesheet>

when applied on the provided XML document:

<root xmlns:ele="element.com" xmlns:att="attribute.com" xmlns:txt="textnode.com">
    <ele:one>a</ele:one>
    <two att:c="d">e</two>
    <three>txt:f</three>
</root>

produces a correct result:

xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com
xml: http://www.w3.org/XML/1998/namespace
ele: element.com
att: attribute.com
txt: textnode.com

with all of these XSLT 1.0 and XSLT 2.0 processors:

MSXML3, MSXML4, MSXML6, .NET XslCompiledTransform, .NET XslTransform, Altova (XML SPY), Saxon 6.5.4, Saxon 9.1.07, XQSharp.

Here is a short C# program that confirms the number of nodes selected in .NET is 16:

namespace TestNamespaces
{
    using System;
    using System.IO;
    using System.Xml.XPath;

    class Test
    {
        static void Main(string[] args)
        {
            string xml =
@"<root xmlns:ele='element.com' xmlns:att='attribute.com' xmlns:txt='textnode.com'>
    <ele:one>a</ele:one>
    <two att:c='d'>e</two>
    <three>txt:f</three>
</root>";
            XPathDocument doc = new XPathDocument(new StringReader(xml));

            double count = 
              (double) doc.CreateNavigator().Evaluate("count(//namespace::*)");

            Console.WriteLine(count);
        }
    }
}

The result is:

16.

UPDATE:

This is an XPath 2.0 expression that finds just the "distinct" namespace nodes and produces a line of name - value pairs for each of them:

for $i in distinct-values(
             for $ns in //namespace::*
               return
                  index-of(
                           (for $x in //namespace::*
                             return
                                concat(name($x), ' ', string($x))

                            ),
                            concat(name($ns), ' ', string($ns))
                          )
                          [1]
                                                  )
  return
    for $x in (//namespace::*)[$i]
     return
        concat(name($x), ' :', string($x), '&#xA;')
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • I get 4 nodes when: 1) I use the default parser in Java 6 using the xPath API. 2) I apply the xPath (as 1.0) using XSV, LIBXML, MSXML4.0, MSXML.NET, and Saxon-EE in oXygen 12.1. 3) I use your XSLT with Xalan (also in oXygen). I get the 16 when I use your XSLT with various flavors of Saxon (in oXygen). I don't understand why I'm getting different answers with Saxon. There must be something simple that I'm missing... – james.garriss Apr 18 '12 at 13:34
  • I'm not inclined to believe that the Java 6 parser, MSXML.NET, and Saxon-EE are "non-compliant (buggy) XPath 1.0 implementations." There must be something else... – james.garriss Apr 18 '12 at 14:06
  • @james.garriss: As I pointed out, I get the same result using MSXML and .NET -- these XSLT processors don't have separate/own XPath evaluation -- they use the available XPath engine. Therefore, there is something in your code that causes less nodes to be selected. When I have time I will add a C# code example, showing that 16 nodes are selected. – Dimitre Novatchev Apr 18 '12 at 15:16
  • @james.garriss: Added the promised C# example -- 16 nodes. Why do you think the .NET XPath evaluation produces something else? Clearly, the reason you get a different result is in your code. – Dimitre Novatchev Apr 18 '12 at 15:59
  • 1
    Indeed, it was something else, notably my (incorrect) use of oXygen. The drop-down that allowed me to select among parsers is **not** connected to the xPath queries but only to XSD validation. Thus what I believe is really happening is this: When I select xPath 1.0, oXygen is using Xalan under the hood, and Xalan's implementation is non-compliant. When I select xPath 2.0, it uses Saxon (which is compliant) and I get the correct answer (16 nodes). – james.garriss Apr 18 '12 at 17:11
  • Bottom line: When you said I was using a "non-compliant (buggy) XPath 1.0 implementation," you were correct. I was using Xalan in my Java code and Xalan in oXygen. – james.garriss Apr 18 '12 at 17:12
  • @james.garriss: Glad my answer was useful. – Dimitre Novatchev Apr 18 '12 at 17:40
3

As the previous thread indicates, //namespace::* will return all the namespace nodes, of which there are 16, according to both the XPath 1.0 and XPath 2.0 implementations. It doesn't surprise me if you've found an implementation that doesn't implement the spec correctly.

Finding all the namespace declarations (as distinct from namespace nodes) is not in general possible with either XPath 1.0 or XPath 2.0, because the following two documents are considered equivalent at the data model level:

document A:

<a xmlns="one">
  <b/>
</a> 

document B:

<a xmlns="one">
  <b xmlns="one"/>
</a>

But if we define a "significant namespace declaration" to be a namespace that is present on a child element but not on its parent, then you could try this XPath 2.0 expression:

for $e in //* return
  for $n in $e/namespace::* return
     if (not(some $p in $n/../namespace::* satisfies ($p/name() eq $e/name() and string($p) eq string($n)))) then concat($e/name(), '->', $n/name(), '=', string($n)) else ()
Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • While I very much want this answer to work, as it gets to the real heart of my problem, this xPath does not return anything different from the one I'm already using (//namespace::*). Thanks, though, for trying. – james.garriss Apr 18 '12 at 18:51
0

Here are my results using the XPath 1.0 implementations of .NET's XPathDocument (XSLT/XPath 1.0 data model), XmlDocument (DOM data model) and MSXML 6's DOM; the test code run against your sample XML document is

    Console.WriteLine("XPathDocument:");
    XPathDocument xpathDoc = new XPathDocument("../../XMLFile4.xml");
    foreach (XPathNavigator nav in xpathDoc.CreateNavigator().Select("//namespace::*"))
    {
        Console.WriteLine("Node type: {0}; name: {1}; value: {2}.", nav.NodeType, nav.Name, nav.Value);
    }
    Console.WriteLine();

    Console.WriteLine("DOM XmlDocument:");
    XmlDocument doc = new XmlDocument();
    doc.Load("../../XMLFile4.xml");
    foreach (XmlNode node in doc.SelectNodes("//namespace::*"))
    {
        Console.WriteLine("Node type: {0}; name: {1}; value: {2}.", node.NodeType, node.Name, node.Value);
    }
    Console.WriteLine();


    Console.WriteLine("MSXML 6 DOM:");
    dynamic msxmlDoc = Activator.CreateInstance(Type.GetTypeFromProgID("Msxml2.DOMDocument.6.0"));
    msxmlDoc.load("../../XMLFile4.xml");
    foreach (dynamic node in msxmlDoc.selectNodes("//namespace::*"))
    {
        Console.WriteLine("Node type: {0}; name: {1}; value: {2}.", node.nodeType, node.name, node.nodeValue);
    }

and its output is

XPathDocument:
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.
Node type: Namespace; name: txt; value: textnode.com.
Node type: Namespace; name: att; value: attribute.com.
Node type: Namespace; name: ele; value: element.com.
Node type: Namespace; name: xml; value: http://www.w3.org/XML/1998/namespace.

DOM XmlDocument:
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.
Node type: Attribute; name: xmlns:txt; value: textnode.com.
Node type: Attribute; name: xmlns:att; value: attribute.com.
Node type: Attribute; name: xmlns:ele; value: element.com.
Node type: Attribute; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespa
ce.

MSXML 6 DOM:
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.
Node type: 2; name: xmlns:xml; value: http://www.w3.org/XML/1998/namespace.
Node type: 2; name: xmlns:ele; value: element.com.
Node type: 2; name: xmlns:att; value: attribute.com.
Node type: 2; name: xmlns:txt; value: textnode.com.

So it is certainly not an XPath 1.0 versus XPath 2.0 problem. I think the problem you see is a shortcoming of mapping the XPath data model with namespace nodes against the DOM model with attribute nodes. Someone more familiar with the Java XPath API needs to tell you whether the behaviour you see is correctly implementation dependent as the API specification is not precise enough for the case of mapping the XPath namespace axis to the DOM model or whether it is a bug.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • I agree that it's not an xPath 1.0 vs 2.0 problem, but I'm not (yet) inclined to think the issue is the xPath API in Java 6 (though it may well have various shortcomings) **because** when I swap out the default XML parser in Java 6 (Xalan) with Saxon 9 HE (while making **no changes** to my Java code), it works (that is, it returns 16 nodes instead of 4). This leads me to conclude that Xalan's implementation is the real cause. – james.garriss Apr 18 '12 at 16:40