1

I have to copy an entire project folder inside the MarkLogic server and instead of doing it manually I decided to do it with a recursive function, but is becoming the worst idea I have ever had. I'm having problems with the transactions and with the syntax but being new I don't find a true way to solve it. Here's my code, thank you for the help!

import module namespace dls = "http://marklogic.com/xdmp/dls" at "/MarkLogic/dls.xqy";

declare option xdmp:set-transaction-mode "update";

declare function local:recursive-copy($filesystem as xs:string, $uri as xs:string)
{
  for $e in xdmp:filesystem-directory($filesystem)/dir:entry
  return 
    if($e/dir:type/text() = "file")
        then dls:document-insert-and-manage($e/dir:filename, fn:false(), $e/dir:pathname)
    else
      (
          xdmp:directory-create(concat(concat($uri, data($e/dir:filename)), "/")),
          local:recursive-copy($e/dir:pathname, $uri)
      )

};

let $filesystemfolder := 'C:\Users\WB523152\Downloads\expath-ml-console-0.4.0\src'
let $uri := "/expath_console/"

return local:recursive-copy($filesystemfolder, $uri)
MissArmstrong
  • 59
  • 3
  • 10
  • How many documents are you copying? The good solutions narrow if your data set is very large. Also, are you sure you need dls? Would you possibly be ok with good-ol xdmp:document-insert? – Sam Mefford Aug 29 '17 at 15:58
  • @SamMefford Well I'm trying to copy an entire nested project that acts as a UI console to see the hierarchy of files and folders inside the server itself, so no, using only the xdmp:document-insert won't help. No I'm not sure about the dls, I'm just trying to find a solution. I also tried to use the mlcp command but it can't reach the server. – MissArmstrong Aug 29 '17 at 19:07

1 Answers1

3

MLCP would have been nice to use. However, here is my version:

declare option xdmp:set-transaction-mode "update";

declare variable $prefix-replace := ('C:/', '/expath_console/');

declare function local:recursive-copy($filesystem as xs:string){
   for $e in xdmp:filesystem-directory($filesystem)/dir:entry
    return 
      if($e/dir:type/text() = "file")
         then 
           let $source := $e/dir:pathname/text()
           let $dest := fn:replace($source, $prefix-replace[1], $prefix-replace[2]) 
           let $_ := xdmp:document-insert($source,
              <options xmlns="xdmp:document-load">
                <uri>{$dest}</uri>
              </options>)
           return <record>
                     <from>{$source}</from>
                     <to>{$dest}</to>
                  </record>
         else
           local:recursive-copy($e/dir:pathname)

};

let $filesystemfolder := 'C:\Temp'

return <results>{local:recursive-copy($filesystemfolder)}</results> 

Please note the following:

  • I changed my sample to the C:\Temp dir
  • The output is XML only because by convention I try to do this in case I want to analyze results. It is actually how I found the error related to conflicting updates.
  • I chose to define a simple prefix replace on the URIs
  • I saw no need for DLS in your description
  • I saw no need for the explicit creation of directories in your use case
  • The reason you were getting conflicting updates because you were using just the filename as the URI. Across the whole directory structure, these names were not unique - hence the conflicting update on double inserts of same URI.
  • This is not solid code:
    • You would have to ensure that a URI is valid. Not all filesystem paths/names are OK for a URI, so you would want to test for this and escape chars if needed.
    • Large filesystems would time-out, so spawning in batches may be useful.
      • A an example, I might gather the list of docs as in my XML and then process that list by spawning a new task for every 100 documents. This could be accomplished by a simple loop over xdmp:spawn-function or using a library such as taskbot by @mblakele
  • I recognize the code is not of the best quality and I understand its limits, but since the project is pretty small I didn't think of large file systems. On the other side the project is not so small to be copied by hand, so that was a try to automatize the copy. I also tried with mlcp but currently I'm having problems with saying to him to which particular database copy the files, the only solution I found is to change the database referenced from the default server that is App-Services. I know it's a pretty brutal approach but being new to these things I'm trying to face problems how I can. – MissArmstrong Aug 30 '17 at 15:47
  • Anyway, thank you for your solution, I will try it soon! – MissArmstrong Aug 30 '17 at 15:47
  • mlcp can be run against port 8000 and has an -output_database option. See section 2.4 here: https://docs.marklogic.com/guide/mlcp.pdf – David Ennis -CleverLlamas.com Aug 30 '17 at 18:49
  • from the marklogic guide on the website there was an option to run the command against a specific port, but it's masqueraded so I can only run it on port 8000, but thank you for the other option, it seems a reasonable way to solve the problem I found! – MissArmstrong Sep 06 '17 at 13:22