Questions tagged [mlcp]

MarkLogic Content Pump is an open-source, Java-based command-line tool (mlcp). mlcp provides the fastest way to import, export, and copy data to or from MarkLogic databases. It is designed for integration and automation in existing workflows and scripts.

https://developer.marklogic.com/products/mlcp

User Guide

https://docs.marklogic.com/guide/mlcp

Features

Content Pump can:

  • Bulk load billions of local files
  • Split and load large, aggregate XML files or delimited text
  • Bulk load billions of triples or quads from RDF files
  • Archive and restore database contents across environments
  • Copy subsets of data between databases
  • Load documents from HDFS, including Hadoop SequenceFiles

Data sources and destinations

Content Pump supports moving data between a MarkLogic database and any of the following:

  • Local filesystem
  • HDFS
  • MarkLogic archive
  • Another MarkLogic database

Formats

Content Pump supports

  • XML, JSON, text, binary files
  • RDF encoded in RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, or TriG serialization formats
  • Compressed files and archives (ZIP, GZIP)
  • MarkLogic archive, which includes both content and metadata (e.g., permissions and properties)
  • Delimited text (e.g., CSV) (import only)
  • Temporal Documents
  • Hadoop SequenceFiles

Getting Started with MLCP

You may find this free online training course helpful.

To get started moving data with mlcp, download and unpack the binaries. For those interested in hacking or look at the internals, you can also download the Apache 2.0 licensed source.

To create your first import script make sure you have an XDBC server attached to your database (running on port 8006, for example, below). From the command line, run the following, substituting your particulars.

156 questions
1
vote
1 answer

Records are not ingesting correctly through MLCP when special characters find in the csv

I am ingesting data to MarkLogic using MLCP, but many records got skipped due to invalid characters in the file. Is there any way to ignore the invalid characters and ingest all records present in the CSV without skipping records? Below is the error…
Sam
  • 101
  • 4
1
vote
1 answer

Marklogic include function in the custom transformation module

In the given function of custom transformation module how to call the DHF transformation module for MLCP(/data-hub/5/transforms/mlcp-flow-transform.sjs) in order to add the envelope in the output document? function rewriteURI(content, context) { …
1
vote
1 answer

MarkLogic Splitting XML File in custom transformation module

According to the documentation about Custom Transformation during mlcp ingestion, the function in module can generate zero, one, or many output documents. How could be splitted the following document by tag "person"? Would be also possible to obtain…
Den_Alex
  • 51
  • 2
1
vote
0 answers

MarkLogic Splitting Large XML Files Into Multiple Documents

If we have such input file: $ cat > example.xml George Washington Betsy
Den_Alex
  • 51
  • 2
1
vote
1 answer

How to filter out non-json documents in MarkLogic?

I have a lot of data loaded in my database where some of the documents loaded are not JSON files & just binary files. Correct data looks like this: "/foo/bar/1.json" but the incorrect data is in the format of "/foo/bar/*". Is there a mechanism in…
Mehul
  • 148
  • 9
1
vote
1 answer

MLCP export : is it possible to change the name of zip files created by mlcp export

The zip files created by export have filenames of the form "timestamp-seqnum.zip" like : 20210630144412+0530-000000-XML.zip We want to export all the documents from MarkLogic database and then store it on filesystem. Is there a way to modify name of…
Shabana
  • 121
  • 7
1
vote
1 answer

Can MLCP read input based on a condition

In marklogic, using MLCP can we read /export/import/copy data based on a condition? Example : read only files with students subject element has only maths
user3636924
  • 109
  • 8
1
vote
0 answers

MarkLogic Content Pump (MLCP) Error - Cannot update server-maintained properties of directory URIs

I am running MLCP to copy code into a modules database and getting the following "Cannot update server-maintained properties" error(s): 21/05/20 12:17:12 ERROR contentpump.DatabaseContentWriter: Error setting document properties for /: Cannot…
Tim Meagher
  • 163
  • 6
1
vote
1 answer

running mlcp through gradle and getting Caused by: java.io.IOException: CreateProcess error=206, The filename or extension is too long

I am running a gradle task gradlew -b import.gradle copy_taskName -PinputHost="Host1" -PoutputHost="Host2" -Pduration=1 --stacktrace In import.gradle , there is a mlcp task, where we are passing a taskName.json(where all the query are written in…
ravvi
  • 117
  • 6
1
vote
1 answer

Difference between CORB and MLCP MarkLogic

Are there any differences between CORB and MLCP MarkLogic? I see they do the same kind of job. In what scenarios you use this vs that?
user3636924
  • 109
  • 8
1
vote
1 answer

How can I import xml into MarkLogic with namespaces that are defined in separate files?

I have some XML containing namespaces that are defined in a DTD. When I try to import the xml using the MarkLogic Content Pump (MLCP), it fails, pointing at the undefined namespaces. What is the easiest way to get this data imported? We do have an…
FaraBara
  • 115
  • 7
1
vote
1 answer

mlcp export using query_filter

I am trying to export files from Marklogic server (installed on my machine) to my local environment, and I am getting the following error : mlcp.sh export -host localhost -port 8000 -username admin \ > -password admin -mode local…
manie
  • 355
  • 1
  • 5
  • 13
1
vote
1 answer

export results of MarkLogic query (mlcp, xdmp.save)

I have a simple query that filter documents based on the value of a property and return their results. eg : var query = 'Yes' const jsearch = require('/MarkLogic/jsearch'); const myPaths = { paths: ['/envelope/instance/entity'] }; result =…
manie
  • 355
  • 1
  • 5
  • 13
1
vote
1 answer

MLCP privileges

I grant hadoop-user-read privilege to developers to run MLCP export data. Do I missing any additional privileges for mlcp user?. developers always get this errors. DEBUG mapreduce.MarkLogicRecordReader: Input query:…
thichxai
  • 1,073
  • 1
  • 7
  • 16
1
vote
1 answer

MLCP command options to export a binary file

I want to export a binary (PPTX) file stored in MarkLogic to my local file system. Is it possible to export a document stored in a URI through MLCP EXPORT? There are millions of documents stored in the same directory - so MLCP EXPORT with…
P K
  • 162
  • 12