Is there any possible way to bulk load data using MLCP as a scheduled task in Marklogic
2 Answers
Scheduled tasks inside MarkLogic
can call external services (using HTTP
), but they don't have a way to run an external command. You do have some options:
- schedule the
MLCP
job externally, using cron on Linux or something along those lines; - restructure your load using
JavaScript
orXQuery
; you can retrieve data from a file system, run it through some transforms, and insert it into the database using modules running inMarkLogic
; - set up a Java app server, have your scheduled task make an
HTTP
request to that server and have the Java app server callMLCP
I think I'd start with the first option, but which one is best depends on your use case.

- 771
- 1
- 8
- 22

- 8,352
- 20
- 38
-
Then how is it possible to perfrom the following functionality using Marklogic Task sceduler for bulk load of content which is mentioned i their documentation https://docs.marklogic.com/guide/admin/scheduling_tasks Loading content. For example, periodically checking for new content from an external data source, such as a web site, web service, etc – Kiran Apr 20 '15 at 05:08
-
Checking an external data source like a web site or service can be done using functions like [xdmp:http-get()](https://docs.marklogic.com/xdmp:http-get). Scheduled tasks call main modules, so you could set up a scheduled to talk to http-get something (for instance, an RSS feed) and then store it in the database, doing some transformation if needed. – Dave Cassel Apr 20 '15 at 11:24
-
What I need to perform is a bulk load of data in to Marklogic Database from my external file system in my local PC using MLCP[Marklogic Content Pump]. Can this be done as a scheduled task using Marklogic Task Scheduler – Kiran Apr 21 '15 at 06:48
-
Not directly. You'll need to set up Tomcat (or similar) with a service that will launch MLCP. MLCP is written in Java, so in your service code you can call directly into the MLCP JAR. The mlcp.sh script uses the `com.marklogic.contentpump.ContentPump` class; seems like a good starting point. Alternatively, you could use [Runtime.getRuntime().exec()](http://www.mkyong.com/java/how-to-execute-shell-command-from-java/) to execute the MLCP script itself. With that in place, your scheduled task will call that service using xdmp.httpGet(). – Dave Cassel Apr 21 '15 at 11:16
You can't invoke mlcp via a scheduled task; I recommend trying something like Apache Camel for this.
Camel has a Timer component and a Quartz component, either of which can be used for scheduling.
And here's an example Camel file with a route (commented out, but still operable) that is initiated by a Timer which then writes a file to disk and ingests it via mlcp - https://github.com/rjrudin/ml-camel-client/blob/master/src/main/resources/META-INF/camel-routes.xml .
I've had good success with doing all kinds of processing/scheduling in Camel and then ultimately ingesting content via mlcp. I think it's a good fit for your use case here so you can leverage what mlcp does best - get content into MarkLogic as fast as possible.

- 2,108
- 9
- 7