1

I am trying to use the mlcp.bat to extract the following document with URI: /category/[2014] xxx.xml

This is the mlcp command used with parameters:

mlcp.bat export -host localhost -port 8000 -username admin -password admin -mode local -database database-content -output_file_path C:/mlcp/bin/xmlexport -document_selector '/CaseReport/Metadata[id="16594-SSP-M"]' -indented true

After executing the above command, there are no document extracted :( Below is the mlcp output:

INFO contentpump.ContentPump: Job name: local_320491878_1
INFO mapreduce.MarkLogicInputFormat: Fetched 1 forest splits.
INFO mapreduce.MarkLogicInputFormat: Made 2 split(s).
INFO contentpump.LocalJobRunner:  completed 0%
INFO contentpump.LocalJobRunner: com.marklogic.mapreduce.MarkLogicCounter:
INFO contentpump.LocalJobRunner: ESTIMATED_INPUT_RECORDS: 35722
INFO contentpump.LocalJobRunner: INPUT_RECORDS: 0
INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 0
INFO contentpump.LocalJobRunner: Total execution time: 26 sec

== UPDATE == This is the first 3 lines of the XML document content with uri /category/[2014] xxx.xml

<?xml version="1.0" encoding="UTF-8"?>
<CaseReport xlink:type="extended" category="unreported" neutralcitation="[2014] xxx" year="" volume="" series="" pageno="" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:exslt="http://exslt.org/common">
  <Metadata id="16594-SSP-M">
Eugene
  • 1,013
  • 1
  • 22
  • 43
  • document_selector is supposed to be an XPath. "To select documents matching an XPath expression, use `-document_selector`. To use namespace prefixes in the XPath expression, define the prefix binding using `-path_namespace`." What is `/category/[2014] xxx.xml` supposed to be? A URI of a single document? – Mads Hansen Oct 27 '22 at 11:51
  • Yup /category/[2014] xxx.xml is the uri of a XML document. – Eugene Oct 28 '22 at 01:33
  • I'd try an options file as well instead of command-line arguments, just to rule out any issues with single/double quotes on the command line – rjrudin Oct 28 '22 at 11:36

1 Answers1

0

The -document_selector option expects you to specify an XPath that would select documents from the database. You are providing the URI of a document.

Instead, use -query_filter and specify a query that uses the cts:document-query() to select with that URI: cts:document-query("/category/[2014] xxx.xml")

This is an example of that query serialized as XML:

-query_filter
<cts:document-query xmlns:cts="http://marklogic.com/cts"><cts:uri>/category/[2014] xxx.xml</cts:uri></cts:document-query>

This is an example of that query serialized as JSON:

-query_filter 
{"documentQuery":{"uris":["/category/[2014] xxx.xml"]}} 

In order to avoid quotes and escaping issues with the query on the commandline, you would be better off putting this option into an options file and then using the -option_file option with the path to the file.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147