2

Is there any possible way to bulk load data using MLCP as a scheduled task in Marklogic

Mads Hansen
  • 63,927
  • 12
  • 112
  • 147
Kiran
  • 41
  • 3

2 Answers2

2

Scheduled tasks inside MarkLogic can call external services (using HTTP), but they don't have a way to run an external command. You do have some options:

  • schedule the MLCP job externally, using cron on Linux or something along those lines;
  • restructure your load using JavaScript or XQuery; you can retrieve data from a file system, run it through some transforms, and insert it into the database using modules running in MarkLogic;
  • set up a Java app server, have your scheduled task make an HTTP request to that server and have the Java app server call MLCP

I think I'd start with the first option, but which one is best depends on your use case.

Sofia
  • 771
  • 1
  • 8
  • 22
Dave Cassel
  • 8,352
  • 20
  • 38
  • Then how is it possible to perfrom the following functionality using Marklogic Task sceduler for bulk load of content which is mentioned i their documentation https://docs.marklogic.com/guide/admin/scheduling_tasks Loading content. For example, periodically checking for new content from an external data source, such as a web site, web service, etc – Kiran Apr 20 '15 at 05:08
  • Checking an external data source like a web site or service can be done using functions like [xdmp:http-get()](https://docs.marklogic.com/xdmp:http-get). Scheduled tasks call main modules, so you could set up a scheduled to talk to http-get something (for instance, an RSS feed) and then store it in the database, doing some transformation if needed. – Dave Cassel Apr 20 '15 at 11:24
  • What I need to perform is a bulk load of data in to Marklogic Database from my external file system in my local PC using MLCP[Marklogic Content Pump]. Can this be done as a scheduled task using Marklogic Task Scheduler – Kiran Apr 21 '15 at 06:48
  • Not directly. You'll need to set up Tomcat (or similar) with a service that will launch MLCP. MLCP is written in Java, so in your service code you can call directly into the MLCP JAR. The mlcp.sh script uses the `com.marklogic.contentpump.ContentPump` class; seems like a good starting point. Alternatively, you could use [Runtime.getRuntime().exec()](http://www.mkyong.com/java/how-to-execute-shell-command-from-java/) to execute the MLCP script itself. With that in place, your scheduled task will call that service using xdmp.httpGet(). – Dave Cassel Apr 21 '15 at 11:16
2

You can't invoke mlcp via a scheduled task; I recommend trying something like Apache Camel for this.

Camel has a Timer component and a Quartz component, either of which can be used for scheduling.

And here's an example Camel file with a route (commented out, but still operable) that is initiated by a Timer which then writes a file to disk and ingests it via mlcp - https://github.com/rjrudin/ml-camel-client/blob/master/src/main/resources/META-INF/camel-routes.xml .

I've had good success with doing all kinds of processing/scheduling in Camel and then ultimately ingesting content via mlcp. I think it's a good fit for your use case here so you can leverage what mlcp does best - get content into MarkLogic as fast as possible.

rjrudin
  • 2,108
  • 9
  • 7