Questions tagged [mlcp]

MarkLogic Content Pump is an open-source, Java-based command-line tool (mlcp). mlcp provides the fastest way to import, export, and copy data to or from MarkLogic databases. It is designed for integration and automation in existing workflows and scripts.

https://developer.marklogic.com/products/mlcp

User Guide

https://docs.marklogic.com/guide/mlcp

Features

Content Pump can:

  • Bulk load billions of local files
  • Split and load large, aggregate XML files or delimited text
  • Bulk load billions of triples or quads from RDF files
  • Archive and restore database contents across environments
  • Copy subsets of data between databases
  • Load documents from HDFS, including Hadoop SequenceFiles

Data sources and destinations

Content Pump supports moving data between a MarkLogic database and any of the following:

  • Local filesystem
  • HDFS
  • MarkLogic archive
  • Another MarkLogic database

Formats

Content Pump supports

  • XML, JSON, text, binary files
  • RDF encoded in RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, or TriG serialization formats
  • Compressed files and archives (ZIP, GZIP)
  • MarkLogic archive, which includes both content and metadata (e.g., permissions and properties)
  • Delimited text (e.g., CSV) (import only)
  • Temporal Documents
  • Hadoop SequenceFiles

Getting Started with MLCP

You may find this free online training course helpful.

To get started moving data with mlcp, download and unpack the binaries. For those interested in hacking or look at the internals, you can also download the Apache 2.0 licensed source.

To create your first import script make sure you have an XDBC server attached to your database (running on port 8006, for example, below). From the command line, run the following, substituting your particulars.

156 questions
2
votes
2 answers

Execute MLCP Content Load Command as a schedule task in Marklogic

Is there any possible way to bulk load data using MLCP as a scheduled task in Marklogic
Kiran
  • 41
  • 3
2
votes
1 answer

marklogic content pump yarn support

We are running mlcp.sh in distributed mode on cdh5.2.4, the job is always running in local its not submitting to yarn/resource manager. does anyone successfully implement mlcp on cdh5+? we are using marklogic-contentpump-1.0.5.jar bin/mlcp.sh…
2
votes
1 answer

Marklogic mlcp import operation with filename-as-collection and javascript transform throws exception

After some experimenting with Marklogic 8 and Marklogic Content Pump, I'm running into an issue with importing data into a Marklogic database. I'm trying to run an mlcp import operation to load data from a set of csv-files with input settings like…
Gertjan
  • 41
  • 3
2
votes
3 answers

Marklogic MLCP: how to get number of records inserted?

I am loading data using mlcp. After completion of this process how can i get the number of documents inserted into the db? Edit: Actually I am initializing this MLCP process from java and i want the record inserted count in the java application. How…
user3463568
  • 103
  • 1
  • 6
2
votes
1 answer

MLCP Bulk Loading

I have almost 10000 XML (small) files and I am putting them into MarkLogic through MLCP. At the time of ingestion, I am doing some transformation and the main thing under transformation is Dictionary updation. I am updating Dictionary from the input…
Navin Rawat
  • 3,208
  • 1
  • 19
  • 31
2
votes
1 answer

Loading CSV (or TSV) into MarkLogic with automatic encoding

I have successfully loaded a very clean (plain English, no fancy symbols or images) CSV file into MarkLogic using MLCP (MarkLogic Content Pump) so that it would take the first row as the column names, and I've learned that when I try to load…
2
votes
2 answers

Marklogic Content Pump Issue

I am trying to load a dbPedia dataset in .nt format into MarkLogic using the MarkLogic Content Pump. I'm using MarkLogic 7, with an XDBC server running on port 8005 on my machine. My data is present in a file, persondata_en.nt, and I am using the…
1
vote
1 answer

mlcp export with document_selector

I have the following XQuery command to retrive documents on the qconsole declare namespace xsi = "http://www.w3.org/2001/XMLSchema-instance"; (/record[@xsi:noNamespaceSchemaLocation eq 'http://foobar.xsd']); and I try to export the data through…
CJ Chang
  • 325
  • 4
  • 12
1
vote
1 answer

ERROR mapreduce.ContentWriter: Batch 1363075985.8: Document failed permanently: /WDS/raw/aip/Asset/nuDJVK6AQv.json

Using MLCP copy mode trying to copy data from DEV env to UAT env. When I tried to load it's getting error like: 23/05/18 06:58:19 WARN mapreduce.ContentWriter: Batch 124180459.0: SEC-PRIV: Need privilege:…
1
vote
1 answer

Can we replace the default document uri to a value from the document itself during mlcp ingestion in MarkLogic

I want to replace the default document uri of the file to a value from the file's content. For example - the default uri is /test/Invoice.xml I want to replace the doc uri to /Invoice_{current date time from file from field DateCreated}.xml The…
1
vote
1 answer

Can we change the XML structure of a file during file ingestion in MarkLogic using mlcp?

I have a xml file to ingest in MarkLogic database where a new XML field has to be added . And the requirement is to add that XML field only during the mlcp import. Is this possible in MarkLogic using xquery? XML file now…
1
vote
1 answer

Unable to export single document in MarkLogic using MLCP

I am trying to use the mlcp.bat to extract the following document with URI: /category/[2014] xxx.xml This is the mlcp command used with parameters: mlcp.bat export -host localhost -port 8000 -username admin -password admin -mode local -database…
Eugene
  • 1,013
  • 1
  • 22
  • 43
1
vote
0 answers

MLCP copy could only copy partial data from a 3 node cluster

I have a most basic 3-node cluster ML environment. I need to copy the data down to other ML environment with MLCP copy command. However, it seems it could copy about 1/3 of the documents down only. I have no clue about what is wrong. Below is the…
1
vote
1 answer

Mlcp command input_file_path problems with regex

I want to change input_file_path = "C:\\Marklogic\\database-image\\data" to ".*database-image//data.*" but this regex is not working in this commend. Is It something wrong with my regex?
1
vote
1 answer

marklogic MLCP output_uri_replace fail

I'm trying to use a regular expression with -output_uri_replace, and it is failing. Here is my options file: import -host localhost -port 8877 -username xxxx -password xxxx -input_file_path to-import -output_uri_replace ^.*(/[^/]+/),$1 I…