0

I have a collection (ABR) with 1.5 million documents

I want to create a custom flow to process these documents; however, in the 1st instance, I only want the flow to process 10 documents so I can test and assess my custom code.

In the past (prior v5) I would do this with 2 lines

let uriCollection = cts.uris(null, null, cts.collectionQuery("ABR"))
fn.subsequence(uriCollection, 0, 10)

Now in version 5 you place this search in the custom flow.

If I select the Source Type as Collection and selecting Source collection as ABR and run all is well. However, I like to incrementally build the JavaScript as it is not my strength and secondly I just like building things incrementally.

In theory, I should be able to put:

fn.sequence(cuts.uris(null,null,cts(collectionQuery("ABR")),1,10)

In Query Console, it works as a search.

It does not work it Data Hub where I get the following error:

java.lang.RuntimeException: org.springframework.web.client.HttpClientErrorException$BadRequest: 400 Bad Request at com.marklogic.hub.collector.impl.CollectorImpl.run(CollectorImpl.java:145) at com.marklogic.hub.step.impl.QueryStepRunner.runCollector(QueryStepRunner.java:318) at com.marklogic.hub.step.impl.QueryStepRunner.run(QueryStepRunner.java:260) at com.marklogic.hub.flow.impl.FlowRunnerImpl$FlowRunnerTask.run(FlowRunnerImpl.java:264) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.springframework.web.client.HttpClientErrorException$BadRequest: 400 Bad Request at org.springframework.web.client.HttpClientErrorException.create(HttpClientErrorException.java:79) at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:122) at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:102) at com.marklogic.rest.util.MgmtResponseErrorHandler.handleError(MgmtResponseErrorHandler.java:26) at org.springframework.web.client.ResponseErrorHandler.handleError(ResponseErrorHandler.java:63) at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:778) at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:736) at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:710) at com.marklogic.hub.collector.impl.CollectorImpl.run(CollectorImpl.java:139) ... 6 more

I am assuming that datahub purposefully restricts that type of search and expects you to restrict the number of documents in other ways.

Any advice is welcome.

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
  • PS the data was ingested using mlcp so there isn't an ingest flow. – carwoolaman Nov 22 '19 at 03:09
  • Your code snippet has `fn.sequence`, instead of `fn.subsequence` and `cuts.uris` instead of `cts.uris`, and `cts(collectionQuery("ABR"))` instead of `cts.collectionQuery("ABR")`. Were those just typos in your question, or is that the actual code you are trying to execute instead of `fn.subsequence(cts.uris(null,null,cts.collectionQuery("ABR")),1,10)` – Mads Hansen Dec 02 '19 at 18:10

1 Answers1

0

Data Hub Framework version v5 has its own built in collector endpoint that looks at the "sourceQuery" property. The "sourceQuery" property takes a cts.uris() query.

You can do a limit on a cts.uris() query.

This is what you could put instead of fn.sequence

cts.uris(null,"limit=10",cts(collectionQuery("ABR"))

The 2nd argument in cts.uris() takes options one of those options is the limit one. You might want to use skip=n to act like the first argument in fn.sequence. See the docs for all the other options http://docs.marklogic.com/cts.uris

If the cts.uris() is not enough you'll have to create you own collector endpoint to be called first. I ran into this issue and copied the endpoint to take a module path. Incase that happens to you heres the code we used for that. If the "sourceQuery" property is a URIs that exists in the modules database it will invoke that uri passing in anything in the options.

const DataHub = require("/data-hub/5/datahub.sjs");
const datahub = new DataHub();

function get(context, params) {

    const flowName = params["flow-name"];
    const options = params.options ? JSON.parse(params.options) : {};

    let step =  params["step"];

    if (!step) {
      step = 1;
    }

    let flowDoc = datahub.flow.getFlow(flowName);

    if (!fn.exists(flowDoc)) {
      context.outputStatus = [500, 'error'];
      fn.error(null, "RESTAPI-SRVEXERR", Sequence.from([404, "Not Found", "The requested flow was not found"]));
    }
    let stepDoc = flowDoc.steps[step];
    if (!stepDoc) {
        context.outputStatus = [500, 'error'];
        fn.error(null, "RESTAPI-SRVEXERR", Sequence.from([404, "Not Found", `The step number "${step}" of the flow was not found`]));
    }
    let baseStep = datahub.flow.step.getStepByNameAndType(stepDoc.stepDefinitionName, stepDoc.stepDefinitionType);
    if (!baseStep) {
        context.outputStatus = [500, 'error'];
        fn.error(null, "RESTAPI-SRVEXERR", Sequence.from([404, "Not Found", `A step with name "${stepDoc.stepDefinitionName}" and type of "${stepDoc.stepDefinitionType}" was not found`]));
    }
    let combinedOptions = Object.assign({}, baseStep.options, flowDoc.options, stepDoc.options, options);
    const collectorDatabase = combinedOptions.collectorDatabase || params.collectorDatabase;

    const modulesDatabase = combinedOptions.modulesDatabase || xdmp.databaseName(xdmp.modulesDatabase())

    if(!combinedOptions.sourceQuery && flowDoc.sourceQuery) {
      combinedOptions.sourceQuery = flowDoc.sourceQuery;
    }

    let query = combinedOptions.sourceQuery;

    if (!query) {
      datahub.debug.log("The collector query was empty");
      context.outputStatus = [500, 'error'];
      fn.error(null, "RESTAPI-SRVEXERR", Sequence.from([404, "Not Found", "The collector query was empty"]));
    }

    let results;
    try {
        let urisEval;
        const isModule = 
        fn.head(xdmp.eval(`
                var uri; 
                fn.docAvailable(uri)
                `, 
                {uri: query}, 
                {database: xdmp.database(modulesDatabase)}
            ));

        if (isModule) {
            results = xdmp.invoke(query, {options: options}, {database: xdmp.database(collectorDatabase), modules:xdmp.database(modulesDatabase) });
        } else {
            if (/^\s*cts\.uris\(.*\)\s*$/.test(query)) {
                urisEval = query;
            } else {
                urisEval = "cts.uris(null, null, " + query + ")";
            }
            results =  xdmp.eval(urisEval, {options: options}, {database: xdmp.database(collectorDatabase)});
        }

      context.outputStatus = [200, 'okay'];
    } catch (err) {
      datahub.debug.log(err);
      fn.error(null, 'RESTAPI-INVALIDREQ', err);
      context.outputStatus = [500, 'error'];
   }

context.outputTypes = ['application/json'];

return results;

};
exports.GET = get;
Tyler Replogle
  • 1,339
  • 7
  • 13