2

I would like to know how to modify an in-memory copy of original document stored in the DB. I am very happy with the update extension, which allows me to search/replace through text nodes and change them permanently. However, this behavior is not always what I want. There are some special occasions when I need to export the document with minor changes done on the fly. It does not seem eXist supports copy, which I would think about.

For permanent changes I use:

declare function cust-utils:replace-spaces-hard($document as xs:string) as empty() {
    let $doc := doc($document)/tei:TEI
    let $match := '(^|\s| )([szkvaiouSZKVAIOU])[\s]'
    for $i in (1 to 2)
    return
        for $text in $doc//text()
        return
            update value $text[matches(., $match)] with replace($text, $match, '$1$2 ')
};

(I iterate twice because it seems XPATH 2.0 does not allow to use look arounds in regexes and here matches are sometimes overlapping.)

How to do the same temporarily? I tried the interesting function from Datypic but it only returns particular nodes. I need to preserve the document order. Simply said, I need to go through a document tree, replace particular strings and return the document for latter usage as it is, without updating it in the DB.

UPDATE

Unfortunately, this:

declare function cust-utils:copy($input as item()*) as item()* {
    for $node in $input
    return $node
};

does absolutely the same as

declare function cust-utils:copy($input as item()*) as item()* {
for $node in $input
   return 
      typeswitch($node)
        case element()
           return
              element { name($node) } {
                for $att in $node/@*
                   return
                      attribute { name($att) } { $att }
                ,
                (: output all the sub-elements of this element recursively :)
                for $child in $node
                   return cust-utils:copy($child/node())
              }
        default return $node
};

… It seems it returns the document-node without real traversing.

Honza Hejzl
  • 874
  • 8
  • 23

1 Answers1

5

eXist's XQuery Update extension writes all updates to the database and does not support in-memory operations. This in contrast to the W3C XQuery Update Facility 1.0+, which is not supported in eXist. Thus, in eXist, in-memory updates must be performed with pure XQuery, i.e., without the additional syntax and functionality of a formal Update facility.

For in-memory updates with eXist, the traditional path is to perform an "identity transformation", typically using recursive typeswitch operations; see https://en.wikipedia.org/wiki/Identity_transform#Using_XQuery. A simple example showing transformation of text nodes, while preserving document order, is:

xquery version "3.0";

declare function local:transform($nodes as node()*) {
    for $node in $nodes
    return
        typeswitch ($node)
        case document-node() return 
            local:transform($node/node())
        case element() return 
            element {node-name($node)} {
                $node/@*, 
                local:transform($node/node())
            }
        case text() return 
            replace($node, '[a-z]+', upper-case($node))
        (: drop comment & processing-instruction nodes :)
        default return 
            ()
};

let $node := 
    document {
        element root {
            comment { "sample document" },
            element x {
                text { "hello" },
                element y {
                    text { "there" }
                },
                text { "friend" }
            }
        }
    }
return 
    <results>
        <before>{$node}</before>
        <after>{local:transform($node)}</after>
    </results>

The result:

<result>
    <before>
        <root>
            <!-- sample document -->
            <x>hello <y>there</y> friend</x>
        </root>
    </before>
    <after>
        <root>
            <x>HELLO <y>THERE</y> FRIEND</x>
        </root>
    </after>
</result>

An alternate approach is to use an in-memory update module, such as Ryan J. Dew's "XQuery XML Memory Operations" module, at https://github.com/ryanjdew/XQuery-XML-Memory-Operations. If you clone the repository (or download the repository's .zip file and unzip it) and upload the folder to eXist's /db collection, the following code will work (adapted from this old exist-open post: http://markmail.org/message/pfvu5omj3ctfzrft):

xquery version "3.0";

import module namespace mem="http://maxdewpoint.blogspot.com/memory-operations" 
    at "/db/XQuery-XML-Memory-Operations-master/memory-operations-pure-xquery.xqy";

let $node := <x>hello</x>
let $copy := mem:copy($node)
let $rename := mem:rename($copy, $node, fn:QName("foo", "y"))
let $replace-value := mem:replace-value($rename, $node, "world")
return
    mem:execute($replace-value) 

The result:

<y xmlns="foo">world</y>
Joe Wicentowski
  • 5,159
  • 16
  • 26
  • 1
    To save you the trouble of updating the module's `map`-related code, I've forked Ryan J. Dew's module and created a branch with the changes I describe here: https://github.com/joewiz/XQuery-XML-Memory-Operations/tree/xdm-3.1. I've submitted this to Ryan as a PR, so hopefully the change makes its way back to his source repository. – Joe Wicentowski May 11 '16 at 19:25
  • Update: Ryan has accepted the PR, so I've updated my answer to simplify the installation. – Joe Wicentowski May 11 '16 at 20:18
  • Thanks for the recommendation. I would be also very happy if you reviewed the update where I am trying to use the `typeswitch`. I don’t understand why it does not work. – Honza Hejzl May 13 '16 at 11:01
  • On first glance, I'd suggest changing the `.` in the first argument of `replace(., $match, '##')` to `$text`. – Joe Wicentowski May 13 '16 at 11:23
  • Also: 1. `$node` has to be of type `element()`, otherwise it is returned intact. 2. Your handling of `text()` nodes could likely return them out of document order. A more complete use of recursive typeswitch would guarantee document order. – Joe Wicentowski May 13 '16 at 11:31
  • I have just tested the module. It works as expected. However, I think the problem is the same (from my point of view)—how to use it when traversing and returning the document in the right order? I still cant make it to work. Those functions from wikipedia don’t work as expected, they always return the whole document but without any real traversing. They do the same as simple document-node return. – Honza Hejzl May 13 '16 at 14:48
  • 1
    I've just added a full typeswitch example to my answer, to illustrate my point #2 in the comment above. Let me know if you have any questions. – Joe Wicentowski May 13 '16 at 15:06
  • 1
    For your specific use - changing the value of text nodes - I think that Ryan J. Dew's module is actually not ideal, as it only lets you replace values with static values. If it let you pass functions instead of static values, this would be great. This reminded me of John Snelson's `transform.xq` project: https://github.com/jpcs/transform.xq. I tried to rig that up in eXist in the same way, but encountered a problem that I couldn't overcome. See https://github.com/jpcs/transform.xq/issues/2. – Joe Wicentowski May 13 '16 at 18:32
  • Thanks a lot! Brilliant as always… Your example helps a lot, it is the clearest example of working with `typeswitch` I have seen so far! It seems very reasonable for my case. I will try it and let you know. – Honza Hejzl May 13 '16 at 18:43
  • Whilst testing, I found another one obstacle―when applied to a real TEI document, it throws `err:XPTY0004: xs:string(Some title) is not a sub-type of node() …`. The _Some title_ is, of course, the very first text node in the document. Really confusing! I would think there should be some treatment of this because of automatic atomisation during the process. – Honza Hejzl May 15 '16 at 08:39
  • Hard to say why without seeing the code. Perhaps best resolved over email? – Joe Wicentowski May 15 '16 at 12:28
  • Updated the typeswitch code about to use `node-name()` instead of `name()` to handle namespaces. – Joe Wicentowski May 16 '16 at 14:56