Questions tagged [mlcp]

MarkLogic Content Pump is an open-source, Java-based command-line tool (mlcp). mlcp provides the fastest way to import, export, and copy data to or from MarkLogic databases. It is designed for integration and automation in existing workflows and scripts.

https://developer.marklogic.com/products/mlcp

User Guide

https://docs.marklogic.com/guide/mlcp

Features

Content Pump can:

  • Bulk load billions of local files
  • Split and load large, aggregate XML files or delimited text
  • Bulk load billions of triples or quads from RDF files
  • Archive and restore database contents across environments
  • Copy subsets of data between databases
  • Load documents from HDFS, including Hadoop SequenceFiles

Data sources and destinations

Content Pump supports moving data between a MarkLogic database and any of the following:

  • Local filesystem
  • HDFS
  • MarkLogic archive
  • Another MarkLogic database

Formats

Content Pump supports

  • XML, JSON, text, binary files
  • RDF encoded in RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, or TriG serialization formats
  • Compressed files and archives (ZIP, GZIP)
  • MarkLogic archive, which includes both content and metadata (e.g., permissions and properties)
  • Delimited text (e.g., CSV) (import only)
  • Temporal Documents
  • Hadoop SequenceFiles

Getting Started with MLCP

You may find this free online training course helpful.

To get started moving data with mlcp, download and unpack the binaries. For those interested in hacking or look at the internals, you can also download the Apache 2.0 licensed source.

To create your first import script make sure you have an XDBC server attached to your database (running on port 8006, for example, below). From the command line, run the following, substituting your particulars.

156 questions
0
votes
1 answer

Error with MLCP copy syntax

I am using the following command mlcp.sh copy -input_host localhost -output_host localhost \ -input_database emh-entity-manager-content \ -output_database emh-schema-map-manager-content \ -input_port 8000 -input_username admin -input_password xxxxxx…
Loren Cahlander
  • 1,257
  • 1
  • 10
  • 24
0
votes
1 answer

XDMP-FORESTERR: Error in merge of forest Documents: SVC-FILWRT: No space left on device

I've been trying to use the mlcp script to load RDF dataset, composed of 2091 nquads, representing a total of 727Mio triples. I've used this command so far: $ mlcp.sh import -username -password -host localhost - port 8000…
0
votes
2 answers

EPUB loading with MLCP

MarkLogic does not 'handle' EPUB. CPF does not. MLCP does not. EPUB is a zip containing mainly xhtml, xml and pictures. I can rename it to .zip and load it with MLCP. But renaming is not so nice, it will show up in the URI unless I add a replace to…
Thijs
  • 1,423
  • 15
  • 38
0
votes
1 answer

How to do bulk update into database using MLCP

I have to update my database using MLCP means within the database there are multiple collection so a particular collection I have to change the element or attribute, so how can I achieve this?
-1
votes
1 answer

How to import all documents from one DB to another DB in marklogic?

how can i export all documents from one DB to another DB in marklogic? I mean from one environment to another environment. Like SIT to UAT. Which is the best way to do it?
-1
votes
1 answer

Blobfuse with Azure on Linux - how to create tmp-path on Azure

We have used below command to mount Azure Blob as folder in CentOS Linux machine sudo blobfuse /mnt/azureblob/ --tmp-path=/mnt/resource/blobfusetmp --config-file=/home/mladmin/fuse_connection.cfg -o attr_timeout=240 -o…
Manish Joisar
  • 1,256
  • 3
  • 23
  • 47
1 2 3
10
11