3

In a directory, I have several XML files like this:

File #1: <root myAtt="one"/>

File #2: <root myAtt="two"/>

I want to select the first document. For this, I use the following query (assuming the directory is called "myDocs"):

collection('myDocs')[/root/@myAtt = 'one']

(I know I could use doc() to select the document of interest. But this example is just a simplified version of the real situation I'm facing, in which I have to work with a collection extracted from a database.)

If I run this query on Saxon-HE 9.6, I get what I expect: <root myAtt="one"/>. But If I run the same query on BaseX 8.3, I surprisingly get: <root myAtt="one"/><root myAtt="two"/>. Confusion ensues.

Apparently, the leading / of the path expression inside the predicate (a "rooted path expression" according to Dr. Kay in XSLT 2.0 and XPath 2.0 4th Edition) is being treated differently across implementations.

In this case, / is supposed to select the document node of the tree that contains the context node. And that is what Saxon does.

But in BaseX, / seems to select the sequence of document nodes being filtered by the predicate. That would explain (if I'm getting it right) that the predicate evaluates to true for all documents, given the special behavior of the general comparison operator = (there's always at least one item in the result sequence equal to 'one').

So, is the behavior of the / operator in rooted path expressions implementation-dependent?

ARX
  • 1,040
  • 2
  • 14
  • 20
  • In that case, I get a single document in both implementations. It's the same as if I use `collection('myDocs')[./root/@myAtt = 'one']`: one single document. Hence my asking about rooted path expressions. It's the only case in which the implementations' behaviors diverge. – ARX Dec 22 '15 at 21:10
  • 1
    Your confusion probably stems from the fact that you seem to think that the root node and the document element are the same thing. They are not. The document element is a child of the root node (which is special in that it may not have more than one element child). – Tomalak Dec 22 '15 at 21:38
  • @Tomalak: It's crystal-clear that document node and root element are different things. That's the only way in which the predicate `[/root/@myAtt = 'one']` could work as it does in Saxon. The leading `/` matches the _document node_ of the tree containing the context node (which is itself a document node —that's the kind of nodes returned by collection(), in this case—); while the first axis step, `root`, matches its only child: the _root element_. That's also why dropping the leading `/` makes both implementations to output the same result: a single document. – ARX Dec 22 '15 at 22:21
  • Point taken. My comment was based on this statement: *"In this case, `/` is supposed to select the document node…"* – Tomalak Dec 22 '15 at 22:35
  • 1
    This has the smell of a BaseX optimizer bug, and I will create an issue to have the main developers think over what the behavior should be. Looking at the query information, you can see that BaseX pulls the underlying `db:open-pre` call _into_ the predicate, to optimize out the `root()` call. As far as I get the semantics, this should not happen, and does not happen if you add a `self`-step in-between: `collection('/tmp/myDocs')/self::node()[/root/@myAtt = 'one']`. – Jens Erat Dec 23 '15 at 08:10
  • 1
    I created [issue #1231](https://github.com/BaseXdb/basex/issues/1231), expect some feedback from the developers soon (we're pretty close to Christmas, though). – Jens Erat Dec 23 '15 at 08:24
  • @Jens Erat: Your dissection of the problem in issue#1231 contributes to understand it. Thanks for taking the time to look into it. – ARX Dec 23 '15 at 14:41
  • @ARX: Christian Grün already uploaded a [new beta version of the next release, containing the bugfix](http://files.basex.org/releases/latest/). I'd probably go for the "current context" version of Christan's answer anyway, which is in my opinion easier to understand than going to the root. – Jens Erat Dec 23 '15 at 14:45
  • @Jens Erat: Yes, I had already noted that dropping the leading `/` fixes the problem. It's just that when you're quickly writing queries, you know that a sure way to grab the document node is to begin the path expression with `/`. In fact, I had the query running for over a year without noticing the problem, until I changed `=` for `eq`. Then, to my surprise, I began to get an error about comparing a sequence to a single item (something I had been unwittingly doing all this time). But I agree with you: in this case, the leading `/` is redundant —the context node is already a document node. – ARX Dec 23 '15 at 15:22

1 Answers1

3

Thanks for the observation. This was a bug in BaseX, which will be fixed in BaseX 8.4 (the fix is also available in the latest snapshot).

The following query is equivalent, as the current context item, which serves as input for the path in the predicate, will be the current root node anyway:

collection('myDocs')[root/@myAtt = 'one']
Christian Grün
  • 6,012
  • 18
  • 34
  • In BaseX, it takes more time to describe a bug, than having it fixed. You can't go any better than that. Thank you! – ARX Dec 23 '15 at 15:00