2

In the following code I would expect it to select every node but it only selects every second node. Is this the correct behaviour or a bug?

<?php
$doc = new DOMDocument('1.0', 'UTF-8');
$doc->loadXML('<root xmlns:ns="foo" ns:id="1"><one/><two/><three/><four/><five/></root>');
$finder = new DOMXPath($doc);
$nodes = $finder->query("//*[namespace::ns]");
foreach ($nodes as $n) {
  var_dump($doc->saveXML($n, LIBXML_NOEMPTYTAG));
}
?>

Output:

string '<root xmlns:ns="foo" ns:id="1"><one></one><two></two><three></three><four></four><five></five></root>' (length=101)
string '<two></two>' (length=11)
string '<four></four>' (length=13)
CJ Dennis
  • 4,226
  • 2
  • 40
  • 69

2 Answers2

0

First of all, none of the elements -- it's elements that //* selects -- in your example XML are in a namespace.

Second, the implementation of the namespace axis in XPath 1.0 is often shoddy. Don't be surprised to see it not working properly, as it appears not to be here.

For better results, use this XPath to select for elements in the foo namespace:

"//*[namespace-uri() = 'foo']"

Although, as mentioned, there are no elements in the foo namespace in your example; there's only an attribute. To select all attributes in the foo namespace (one in this case), use the following XPath:

"//@*[namespace-uri() = 'foo']"

Update:

I think I see the confusion. I'm not looking for nodes within that namespace but nodes where the namespace is available (i.e. to other nodes).

By "available" perhaps you mean declared? In that case, you have two challenges:

  1. As mentioned above, implementation of the namespace axis is often weak.
  2. At a data model level, there are equivalences between XML documents that declare namespaces at different nodes such that the declaration location can be unclear.

For more information, see Find all namespace declarations in an XML document - xPath 1.0 vs xPath 2.0.

Community
  • 1
  • 1
kjhughes
  • 106,133
  • 27
  • 181
  • 240
0

This appears to be a bug in the libxml used by PHP. It results as name in root, two, four.

It get's even weirder when using //*[namespace::*], then you'll get the root node and nodes one, three, five.

Though namespaces are inherited by all child nodes, it appears that there is some faulty implementation in doing so.


Using the following xslt produces all desired nodes:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:template match="/">
    <matches>
      <xsl:copy-of select="//*[namespace::ns]"/>
    </matches>
  </xsl:template>
</xsl:stylesheet>

The XPath itself though could be changed to //* to select all node - regardless of namespace. Also as kjhughes pointed out you are not really using namespaces for your nodes, only for the attribute(s).

Kevin Sandow
  • 4,003
  • 1
  • 20
  • 33
  • The code I posted is a minimal example to show the bug. In my actual XML I have multiple copies of the same namespace inserted randomly throughout as it is built up dynamically and the library only includes the namespace when absolutely necessary. When the XML document is complete I remove the namespaces but I'm having trouble finding them all because of the bug. I have to keep checking if there are any left. – CJ Dennis May 28 '15 at 13:45
  • Then how do you expect the attributes or even nodes with a namespace to behave, simply remove them or cut of the namespace? Either way xslt might be a solution for your problem. – Kevin Sandow May 28 '15 at 15:53
  • When I remove the namespace(s) the library automatically moves any elements or attributes with that namespace into the default namespace, i.e. with no prefix. This part works perfectly whenever it finds a namespace to remove. At the moment I have to keep looping until the namespace can't be found anymore but if the library detected them properly the first time I could remove them in a single step. – CJ Dennis May 29 '15 at 09:34