2

Question on using cts:search over FLOWR. I have a xqy that runs over all docs in the database and checks an element that has a timestamp. We created that timestamp on insertion of the doc. Goal is to delete documents older then x days.

Now I need to know how many docuements I have that are older then x days so I can try a CORB job to delete them.

My query so far works:

xquery version "1.0-ml";
declare namespace j = "http://marklogic.com/xdmp/json/basic";
declare namespace dikw = 'http://www.example.com/dikw_functions.xqy';

(:let $foo := cts:uris((),(), cts:not-query(cts:element-query(xs:QName("j:dikwmetadata"), cts:element-query(xs:QName("j:data"), cts:and-query(()))))):)
let $uris := cts:uri-match("/twitter/*")[1 to 10]
let $today := fn:current-date()
let $days := xs:dayTimeDuration("P30D")

let $today_minus_x := xs:dateTime($today - $days)

for $uri in $uris (:cts:search(doc(), $random-query):)
    let $doc_dikw_date := xdmp:parse-dateTime("[Y0001]-[M01]-[D01] [h01]",xs:string(fn:doc($uri)//j:dikwmetadata//j:timestamp))
    let $to_old := if ($today_minus_x >= $doc_dikw_date)
    then
        true() (: deleted document:)
    else
        false()

return ($uri,$to_old) 

This works ok but I need to know how many there are to see if I can run it from the query console or that I need to set up a sheduled CORB job running every day.

I was looking into cts:search something like:

(:
let $uris2 := cts:search($uris,cts:query(xdmp:parse-dateTime("[Y0001]-[M01]-[D01] [h01]",xs:string(fn:doc($uris)//j:dikwmetadata//j:timestamp))) < $today_minus_x)
:)

But this seems to need elements ... no I am stuck.

Questions: is there a more straightforward way to find and count all documents older then x days?

Paul Sweatte
  • 24,148
  • 7
  • 127
  • 265
Hugo Koopmans
  • 1,349
  • 1
  • 15
  • 27

1 Answers1

2

One of the problem with your current code is that you are parsing dates at run-time. That is always going to be slow, because it needs access to the XML itself.

This would work best if your j:timestamp element would contain a string matching xs:date or xs:dateTime. Then you can declare a (path) range index on that element of type date/dateTime (whatever suits you best).

Alternative is to create something like iso-date(Time) attribute on that element containing a preparsed date of type xs:date(Time), so you can index that one.

Once you have a range index, you can do a (path-)range-query on your element. You could then also use cts:uris to get the docs that need to be deleted..

HTH!

grtjn
  • 20,254
  • 1
  • 24
  • 35
  • thx for your reply, not sure if I can use this timestamp element directly the way you describe? `2014-02-03 21:16:05.348159` – Hugo Koopmans Feb 04 '14 at 10:21
  • if i do `let $d2 := xs:dateTime(fn:doc($uri)//j:dikwmetadata//j:timestamp)` i get `XDMP-CAST: (err:FORG0001) xs:dateTime(fn:doc($uri)/descendant::j:dikwmetadata/descendant::j:timestamp) -- Invalid cast: xs:untypedAtomic("2014-02-03 21:15:35.036983") cast as xs:dateTime on line 17 expr: xs:dateTime(fn:doc($uri)/descendant::j:dikwmetadata/descendant::j:timestamp)` ... so i guess i need to manipulate the documents first to make the timestamp string match? – Hugo Koopmans Feb 04 '14 at 10:27
  • ah ok i found out this timestamp was created by python script on document insert... so other timestamp in different element is directly accessable. need to change the python code so it puts timestamp in ML format. thx! – Hugo Koopmans Feb 04 '14 at 10:40
  • 1
    @HugoKoopmans: You only need to replace the space with 'T', and it becomes castable to xs:dateTime: `xs:dateTime(2014-02-03T21:16:05.348159)` – grtjn Feb 04 '14 at 14:43