3

I've been trying to query a BaseX db which contains more than 1500000 items. When i run this query

for $item in collection('coll')//item
    return $item (: returns an xml element :)

it executes in less than a second.

But when i try to return the result in an xml I get an "Out of main memory" error.

<xml>{
    for $item in collection('coll')//item
       return $item
}</xml>

This is something that makes me want to abandon the native xml db approach (same happens with other DBs, such as eXistDB), so if anyone has any info this problem, it would be extremely helpful.

Thanks

unicorn
  • 139
  • 1
  • 9

2 Answers2

5

Due to the semantics of XQuery, all child nodes need to be copied if they are wrapped by a new parent node. This is demonstrated by the following query, which compares the node identity of the original and copied node. It will yield false:

let $node := <node/>
let $parent := <parent>{ $node }</parent>
return $parent/node is $node

As copying millions of nodes is expensive, this inevitably leads to an out-of-memory error.

If you write results to files, here is a pragmatic solution to get around this restriction:

(:~ 
 : Writes element to a file, wrapped by a root node.
 : @param  $path      path to file
 : @param  $elements  elements to write
 : @param  $name      name of root node
 :)
declare function local:write-to(
  $path      as xs:string,
  $elements  as element()*,
  $name      as xs:string
) as empty-sequence() {
  file:write-text($path, '<' || $name || '>'),
  file:append($path, $elements),
  file:append-text($path, '</' || $name || '>')
};

local:write-to('result.xml', <result/>, 'root')

To anticipate criticism: This is a clear hack. For example, the approach conflicts with various non-default serialization parameters of BaseX (the result will not be well-formed if an XML declaration needs to be be output, etc.).

Christian Grün
  • 6,012
  • 18
  • 34
  • Never thought that nodes are copied in this case... That explains it. I will try what you propose and come back with comments. Thank you very much! – unicorn Aug 21 '17 at 11:02
4

With BaseX 9.0, you can temporarily disable node copying via the COPYNODE option:

(# db:copynode false #) {
  <xml>{
    for $item in collection('coll')//item
    return $item
  }</xml>
}
Christian Grün
  • 6,012
  • 18
  • 34
  • What does "temporarily" imply in this case? – unicorn Mar 13 '18 at 10:18
  • 1
    Temporarily is supposed to mean »inside the scope of the pragma«. – Christian Grün Mar 13 '18 at 14:22
  • If this has the same effect as the original, that is, return the wrapped nodes, I wonder why this is not the default behavior, since it is not resource consuming... – unicorn Mar 14 '18 at 16:46
  • 1
    The pragma circumvents the semantics of the XQuery specification, so it was introduced as an optional feature. An example: `let $x := return (# db:copynode false #) { <_>{ $x }/x is $x }` yields true; without pragma, it yields false. – Christian Grün Mar 14 '18 at 19:56