3

I'm trying to match any instance of a specific element that lacks an xmlns attribute, but I'm having trouble getting a match with the syntax. My xml is as shown:

<root>
<node xmlns:m="http://google.com"/>
<node style="block"/>
</root>

I want to return the first node, but not the second. If I were matching based on the style attribute shown on the second node, I could simply use not(@style) but this doesn't work for not(@xmlns:m). I've tried to circumvent this by searching for any attribute with a value that matches the URI, but again, this works for other attributes, but not xmlns:m. Is there some sort of limitation or syntax quirk that's required to match/parse xmlns attributes with XPath?

Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
wolfmason
  • 399
  • 1
  • 13
  • 1
    Please show your Schematron rules. `xmlns` is not an attribute, it is a namespace declaration. What do you mean by "I want to **return" the first node"? Schematron does not return nodes, it only makes assertions. Finally, _why_ do you need to detect namespace declarations? – Mathias Müller Feb 17 '16 at 20:49
  • I want to underscore Mathias' question, *why* do you need to detect namespace declarations? It's like downloading an executable (compiled) program and trying to figure out how many comments the developer used: It's not supposed to matter. All that matters is which attributes (or elements) are in what namespaces. Which is generally independent of where the namespace declarations occur. – LarsH Feb 17 '16 at 22:19
  • Sorry, I didn't include the schematron because I strongly suspected the issue was related to xpath. I only included schematron in the title in the off chance this was a schematron-specific issue. As for "why" do I need to detect these -- I have a 3rd party tool that renders certain xml elements as flat images, but it fails when the namespace is missing. I want schematron to "return" (catch, match, find, etc.) these elements and make an assertion before they fail at the build stage. *I* don't need to see them, but my crummy 3rd party tool does for whatever reason. – wolfmason Feb 18 '16 at 16:26
  • 1
    wolfmason, namespace declarations usually only matter if some elements or attributes actually _are_ in that namespace - otherwise they do not harm. Your question gives the impression that you are trying to detect namespaces that are _not_ used in the XML document. If your 3rd party tool cares about those, it's designed really badly. – Mathias Müller Feb 18 '16 at 17:09
  • Does your tool fail when the **namespace declaration** is missing, or when the relevant elements are **not in the right namespace**? The latter is how it *should* work, and is easy for XPath and Schematron to detect. When you say "the namespace is missing," it sounds like you're confusing the two concepts. Straightening that out is probably the key to solving this problem. IMO the most likely answer is that the 3rd-party tool cares about the namespaces of the elements, not about namespace declarations. – LarsH Feb 19 '16 at 16:54
  • 1
    I recommend this article https://msdn.microsoft.com/en-us/library/ms950779.aspx for understanding namespaces, although it's kind of long. – LarsH Feb 19 '16 at 17:00

2 Answers2

6

Is there some sort of limitation or syntax quirk that's required to match/parse xmlns attributes with XPath?

Yes, kind of. The quirk is that things like

xmlns:m="..."

syntactically are attributes, but serve a more specific role than attributes. They are namespace declarations that bind prefixes to a namespace URI. The prefixes can then be used to qualify element and attribute names. There is also a default namespace that is not bound to a prefix.

It is impossible to detect namespace declarations because XPath (and XSLT, and Schematron) do not operate on actual XML documents, but on abstract representations of them. In this representation (a model), namespace declarations are absent, but there are namespace nodes which indirectly point to namespace declarations.

Once an XML parser has processed an XML document, namespaces and attributes are distinct types of nodes that you can access with XPath axes. I am not sure I understand why you would want to do that, but you can report namespace nodes using the namespace:: axis:

namespace::*[not(. = 'http://www.w3.org/XML/1998/namespace')]

You have to be careful and exclude the predefined namespace URI

http://www.w3.org/XML/1998/namespace

which is bound to the xml: prefix by default.

ISO Schematron

<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" queryBinding="xslt2">

    <sch:pattern>
        <sch:rule context="node">
            <sch:report test="namespace::*[not(. = 'http://www.w3.org/XML/1998/namespace')]">Namespace node found!</sch:report>
        </sch:rule>
    </sch:pattern>

</sch:schema>

The document you show will not be valid against this SCH file and the Schematron validator will point to the node element with the namespace declaration:

<node xmlns:m="http://google.com"/>

as the source of the error.


Please Note

The namespace::* axis selects namespace nodes, not namespace declarations. Since namespaces are inherited by all elements that are in scope, it is not only the element where the namespace is declared that has a namespace node. All of its descendants will also have a namespace node:

<root>
  <node xmlns:m="http://google.com">
    <descendant_element_with_namespace_node/>
  </node>
  <node style="block"/>
</root>

See LarsH's answer for a more sophisticated XPath expression that accounts for this fact.

Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
  • Matthias, of course you're right that things like `xmlns:m` are namespace declarations, but do you have a reference for the statement that they are not attributes? https://www.w3.org/TR/xml-names11/#ns-decl calls them attributes. Which leaves open the question of why they're not selected by the `attribute::` axis. – LarsH Feb 17 '16 at 22:04
  • I don't see anything in the XPath spec that explains the latter. https://www.w3.org/TR/xml-infoset/#infoitem.attribute says that "There is an attribute information item for each attribute (specified or defaulted) of each element in the document, including those which are namespace declarations. The latter however appear as members of an element's [namespace attributes] property rather than its [attributes] property." – LarsH Feb 17 '16 at 22:13
  • 1
    @LarsH It is definitely imprecise to say that `xmlns:m` is not an attribute, even if I wrote it to emphasize that conceptually, attributes and namespaces are very different. I'll amend my answer as soon as I find time. Thanks for pointing this out. – Mathias Müller Feb 17 '16 at 22:23
  • 1
    @LarsH _"Which leaves open the question of why they're not selected by the attribute:: axis."_ That's an interesting question that I do not know the answer to right now. My suspicion is that this is because XPath expressions operate on an XDM model of the document, where attributes and namespaces were already separated by the XML parser. I think we are mixing two different levels of organization here: the actual appearance of an XML document (where namespace declarations are attributes) and the parsed model of the document (where namespace nodes are not attribute nodes). – Mathias Müller Feb 17 '16 at 22:25
  • On a somewhat different topic, the XPath expression you gave for selecting namespaces, `namespace::*[not(. = 'http://www.w3.org/XML/1998/namespace')]`, selects namespace *nodes* rather than namespace *declarations*. But an element must have a namespace node for every declaration that's *in scope* (https://www.w3.org/TR/xpath-datamodel/#NamespaceNode). So all descendants of `` would have a namespace node for `m`, even if they have no namespace declaration. – LarsH Feb 17 '16 at 22:41
  • The latter would be OK for the OP's given example, but would give wrong results in other cases. Namespace declarations really aren't (easily) detectable, by design. – LarsH Feb 17 '16 at 22:42
  • 1
    @LarsH Thanks for your help Lars! It is very much appreciated. I have tried to improve my answer, let me know what you think. – Mathias Müller Feb 18 '16 at 08:31
  • Thanks to you both so much for the excellent explanations! I did quite a bit of searching and couldn't find a resource explaining why namespaces were not treated like common attributes in xpath. – wolfmason Feb 18 '16 at 16:48
  • The improved answer is good. I also just edited my answer, since I realized it was not correct for all cases either. – LarsH Feb 19 '16 at 03:30
2

As stated elsewhere, the question asks for something that XPath, and XML tools in general, are not designed to do: extract information about namespace declarations. XPath is designed to be able to reliably detect what namespace (as identified by its namespace URI, not its prefix) any element or attribute is in, and to select nodes based on their namespace. For that reason, any method to detect namespace declarations using standard XML tools is doomed to be unreliable.

Building on Mathias' answer, I would say to use this XPath test:

namespace::*[not(. = 'http://www.w3.org/XML/1998/namespace')
         and not(. = ../../namespace::*)]

(tested using http://www.qutoric.com/xslt/analyser/xpathtool.html). In a case like

<root>
  <node xmlns:m="http://google.com">
    <node style="block"/>
  </node>
</root>

the above XPath expression is truthy for only one node element, the outer one, thus satisfying the OP's question; whereas Mathias' expression would be truthy for both node elements.

It works by testing for namespace nodes (on the current element) whose namespace URIs are not shared by the parent element's namespace nodes.

However, this XPath expression will not always detect namespace declarations either. For example, in

<root>
  <node xmlns:m="http://google.com">
    <node xmlns:g="http://google.com" style="block"/>
  </node>
</root>  

the above XPath expression would be truthy only for the outer node, and would not detect the namespace declaration on the inner one. Again, this is because namespace declarations were intended only as a way to make it easier to specify what elements and attributes were in what namespaces, not as significant information carriers in themselves.

Granted, the above example seems unrealistic, because the inner namespace declaration is redundant. Nevertheless it is well-formed XML, and could easily be generated by well-behaved programs that produce the inner <node> without direct knowledge of the outer <node>'s namespace declarations.

Additional caveat: The namespace:: axis is deprecated in XPath 2.0 and later, so it may not be supported by whatever engine you use to run Schematron.

LarsH
  • 27,481
  • 8
  • 94
  • 152
  • P.S. The XPath expression could (maybe) be improved by taking into account the namespace prefix of each namespace node, in addition to its namespace URI. Nevertheless it would still not work in all cases. – LarsH Feb 19 '16 at 03:32