1

I just want to find all document names in a forest.

I know the forest name(ABC) and I need to find all documents in that forest(ABC). My out put should looks like this.

Forest ABC has

A.xml
B.xml
C.xml

and so on...

G Irala
  • 13
  • 3

2 Answers2

4

Searches and lexicon lookups can be constrained by forest, so you should be able to get the document names from the URI lexicon with a call similar to the following:

cts.values(cts.uriReference(), null, null, null, null, xdmp.forest('ABC'))

That said, there aren't many common motivations for looking up the names of documents in a forest. What are you trying to accomplish?

ehennum
  • 7,295
  • 13
  • 9
  • The actual problem is one of the disks has failed in ML and couple of forests(lets say x,y) has been restored.In that process x,y has ended up with duplicate docs which are in remaining forests(a,b,c..).So when I am loading docs to ML,I am getting errors saying doc exists in forest a and forest x,forest b and forest x.So I am deleting the dup docs from forest x and reloading again.Since I have the docs in hand(which I am going to load) which I can see and if I find the names of the docs in forest x then I can delete the dup docs before I actually load.I am not sure if I am thnkng correctly. – G Irala Apr 24 '18 at 15:50
  • I believe my question should have been "how to find and delete all duplicate documents in ML forests". – G Irala Apr 24 '18 at 15:52
  • 2
    https://help.marklogic.com/knowledgebase/article/View/22/0/handling-xdmp-dbdupuri-errors – Mads Hansen Apr 24 '18 at 16:58
  • @Mads Hansen - I have been following the same link to delete the duplicate docs. In my case I have 100s of dup docs created in forest x which I am finding only after loading the content. Instead of finding the dup doc after loading is there any way I can find the dup docs by comparing the docs in forests and delete in bulk. – G Irala Apr 24 '18 at 17:44
2

In order to list all of the URIs from a particular forest, you can use cts:uris() and specify the forest-id in the 5th parameter:

cts:uris((), (), cts:true-query(), (), xdmp:forest("ABC"))

Your comment suggested that the reason why you are attempting to list all of the URIs from a particular forest was so that you could delete the ones that are duplicates.

The code below could be use to obtain all of the URIs from the specified forest, and then remove them from that forest if they are duplicates.

If you attempt to read the document properties and a XDMP-DBDUPURI exception is thrown, catch that exception and then delete the document in a different transaction from the problem forest.

(: update this with the name of problem forest :)
declare variable $PROBLEM-FOREST := xdmp:forest("ABC"); 
declare variable $URIS := cts:uris((), (), cts:true-query(), (), $PROBLEM-FOREST);

for $uri in $URIS
return
  try {
      let $properties := xdmp:document-get-properties($uri, xs:QName("foo"))
      return ()
  } catch($e) {
    if ($e/error:code = "XDMP-DBDUPURI") then
      xdmp:invoke-function(
        function(){ xdmp:document-delete($uri) },
        <options xmlns="xdmp:eval">
          <isolation>different-transaction</isolation>
          <database>{$PROBLEM-FOREST}</database>
        </options>
      )  
    else ()
  }

Depending on how many documents are in this forest, you may run into timeout issues. You might consider running this as a CORB job where the forsts URIs are selected in the URIS-MODULE and then each inspection/delete is handled individually in the PROCESS-MODULE.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • Thanks so much. I tried your code as it is and it ran in to time out issues. I dint see your latest comment. So what I did was pulled out all the duplicated docs from the forests which caught in exception block and deleted them at once using the code given in the link . https://help.marklogic.com/knowledgebase/article/View/22/0/handling-xdmp-dbdupuri-errors – G Irala Apr 26 '18 at 14:52