1

I am running the CORB but I am getting the URI error. Below is the code and CORB properties

THREAD-COUNT=4
URIS-MODULE=get-uri.xqy
PROCESS-MODULE=report.xqy
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-NAME=report.xml
PRE-BATCH-MODULE=preProces.xqy
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask

get-uri.xqy code:

let $uris := cts:uris((), (), cts:collection-query("InvoiceHistory"))
return (count($uris), $uris)

preProces.xqy code:

declare variable $URI as xs:string external;

(: Retrieve all relevant records from the current document :)
let $records := fn:doc($URI)/records

(: Group records by creation_date and calculate the sum of doc_count for each group :)
let $grouped-records :=
                  for $date in distinct-values($records//document/@creation_date)
                  let $total := sum($records/document[@creation_date = $date]/@doc_count/xs:integer(.))
                  return <group date="{$date}" total-docs="{$total}"/>

  (: Serialize the grouped records as XML and store in a temporary collection :)
 let $temp-doc :=
               <results>{ $grouped-records }</results>

 return xdmp:document-insert("/temp/preprocessed.xml", $temp-doc)

report.xqy code:

declare namespace fn = "http://www.w3.org/2005/xpath-functions";
declare variable $URI as xs:string external;

(: Retrieve the preprocessed data :)
let $preprocessed := doc("/temp/preprocessed.xml")/results

(: Generate the final report :)
let $final-report :=
                <results>{
                           for $group in $preprocessed/group
                           order by $group/@date
                           return <result date="{$group/@date}" total-docs="{$group/@total-docs}"/>
                  }</results>

return $final-report

CORB Error:

com.marklogic.developer.corb.CorbException: Undefined external variable at URI:
    at com.marklogic.developer.corb.AbstractTask.wrapProcessException(AbstractTask.java:426)
    at com.marklogic.developer.corb.AbstractTask.handleRequestException(AbstractTask.java:373)
    at com.marklogic.developer.corb.AbstractTask.invokeModule(AbstractTask.java:202)
    at com.marklogic.developer.corb.PreBatchUpdateFileTask.call(PreBatchUpdateFileTask.java:63)
    at com.marklogic.developer.corb.PreBatchUpdateFileTask.call(PreBatchUpdateFileTask.java:30)
    at com.marklogic.developer.corb.Manager.runPreBatchTask(Manager.java:790)
    at com.marklogic.developer.corb.Manager.populateQueue(Manager.java:857)
    at com.marklogic.developer.corb.Manager.run(Manager.java:603)
    at com.marklogic.developer.corb.Manager.main(Manager.java:140) Caused by: com.marklogic.xcc.exceptions.XQueryException: XDMP-EXTVAR:

(err:XPDY0002) declare variable $URI as xs:string external; -- Undefined external variable fn:QName("","URI")

Where it is going wrong, can anyone please suggest?

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147

1 Answers1

0

The PRE-BATCH module runs before it starts processing each of the URIs that were selected. It is not fed a $URI the same way that the PROCESS-MODULE is.

So, the preProces.xqy is not going to have a value set for the $URI external variable and is throwing an error when it is executed.

It looks like the preProcess.xqy should actually be the PROCESS-MODULE, which would be invoked for each of the URIs selected. However, you are inserting results into a static URI and it would overwrite the content each time it was run (and would be super slow since locks on that URI would make things single threaded).

If you are trying to generate a consolidated XML report, you might consider returning the XML fragment i.e. return <results>{ $grouped-records }</results> instead of inserting into the database, using the PRE-BATCH-MODULE=INLINE-XQUERY|"<results>" and POST-BATCH-MODULE=INLINE-XQUERY|"</results>" and then all of the content will be written to the report.xml output file by the ExportBatchToFileTask.

THREAD-COUNT=4
URIS-MODULE=get-uri.xqy
PROCESS-MODULE=preProces.xqy
PROCESS-TASK=com.marklogic.developer.corb.ExportBatchToFileTask
EXPORT-FILE-NAME=report.xml
PRE-BATCH-MODULE=INLINE-XQUERY|'<results>'
PRE-BATCH-TASK=com.marklogic.developer.corb.PreBatchUpdateFileTask
POST-BATCH-MODULE=INLINE-XQUERY|'</results>'
POST-BATCH-TASK=com.marklogic.developer.corb.PostBatchUpdateFileTask

CoRB will execute the process module and return results for each $URI that is selected by the URIS-MODULE. If you are looking to generate an aggregate result across all of the documents, then you would need to process the result XML file that is generated from this CoRB job.

However, if you created range-indexes on those two attributes then you could easily generate the aggregate report in a single query with something like this:

let $counts-by-date := 
  cts:value-co-occurrences(
    cts:element-attribute-reference(xs:QName("document"), xs:QName("creation_date")), 
    cts:element-attribute-reference(xs:QName("document"), xs:QName("doc_count")), 
    "map")
return
  <results>{
    for $date in map:keys($counts-by-date)
    return <result date="{$date}" ingested-docs="{sum(map:get($counts-by-date, $date))}"/>  
  }</results>
Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • Thanks for your reply can you please confirm PRE-BATCH-MODULE=INLINE-XQUERY| POST-BATCH-MODULE=INLINE-XQUERY| which query i have to paste here , reprot.xqy or preProces.xqy , from preProces.xqy i have removed "xdmp:document-insert("/temp/preprocessed.xml", $temp-doc)" and return is result only ,so only i have confusion , PRE-BATCH-MODULE=INLINE-XQUERY| POST-BATCH-MODULE=INLINE-XQUERY| here, which query i have to paste – Dharmendra Kumar Singh Aug 21 '23 at 06:34
  • here is the updated proProcess module let $records := fn:doc($URI)/records let $grouped-records := for $date in distinct-values($records//document/@creation_date) let $total := sum($records/document[@creation_date = $date]/@doc_count/xs:integer(.)) return let $temp-doc := { $grouped-records } return $temp-doc – Dharmendra Kumar Singh Aug 21 '23 at 08:00
  • What should i pass here in PRE-BATCH-MODULE=INLINE-XQUERY| POST-BATCH-MODULE=INLINE-XQUERY|, plz suggest – Dharmendra Kumar Singh Aug 21 '23 at 08:01
  • input sample expected output : – Dharmendra Kumar Singh Aug 21 '23 at 08:45
  • Sorry, forgot that you need to wrap the the start and end result tags in quotes, so that it just sees it as a string and doesn't evaluate as XQuery and complain about incomplete XML. Also, need to add the POST-BATCH-TASK so that it writes the closing `` – Mads Hansen Aug 21 '23 at 11:40
  • @Adams , Thanks it run , but i didn't get the expected result, i have pasted my preProcess.xqy and sample input and expected output above in the comment , can you please suggest about it – Dharmendra Kumar Singh Aug 21 '23 at 11:49
  • If your expected results should have a `` element, then you should change your `$grouped-records` to return `result` instead of `group`. And if you are going to generate the opening/closing `results` then no need to have that in the process module (if the desire is one `results` that contains many `result` children in the final output. – Mads Hansen Aug 21 '23 at 12:02
  • if you see my expected result then it is based on sum where some calculation going on, for the each i am looking the attribute creation_date and doing sum of attribute doc_count="3" but the problem is CORB doing calulation seperatly for each file but i want the uniq result for each date , have look once my input and expected output – Dharmendra Kumar Singh Aug 21 '23 at 12:15
  • When I run your input doc through that transform, it produces the desired output. If you want to calculate the sum of dates across many docs, then could probably do this more efficiently with indexes in a single query, or may need to run something against the XML that this generates to compute that. – Mads Hansen Aug 21 '23 at 12:19
  • current output expected – Dharmendra Kumar Singh Aug 21 '23 at 12:21
  • declare namespace fn = "http://www.w3.org/2005/xpath-functions"; let $records := cts:search(/records, cts:collection-query("InvoiceHistory"), (), ()) return let $report := { let $distinct-dates := distinct-values($records//document/@creation_date) for $date in $distinct-dates let $total := sum( $records/document[@creation_date = $date]/@doc_count/xs:integer(.)) order by $date return } return $report this query was giving me the correct output running on QC – Dharmendra Kumar Singh Aug 21 '23 at 12:25
  • But the above query giving me the timeout exceed issue on query console, so that i was looking CORB, can you please suggest to do it better way? – Dharmendra Kumar Singh Aug 21 '23 at 12:26