Questions tagged [mlcp]

MarkLogic Content Pump is an open-source, Java-based command-line tool (mlcp). mlcp provides the fastest way to import, export, and copy data to or from MarkLogic databases. It is designed for integration and automation in existing workflows and scripts.

https://developer.marklogic.com/products/mlcp

User Guide

https://docs.marklogic.com/guide/mlcp

Features

Content Pump can:

  • Bulk load billions of local files
  • Split and load large, aggregate XML files or delimited text
  • Bulk load billions of triples or quads from RDF files
  • Archive and restore database contents across environments
  • Copy subsets of data between databases
  • Load documents from HDFS, including Hadoop SequenceFiles

Data sources and destinations

Content Pump supports moving data between a MarkLogic database and any of the following:

  • Local filesystem
  • HDFS
  • MarkLogic archive
  • Another MarkLogic database

Formats

Content Pump supports

  • XML, JSON, text, binary files
  • RDF encoded in RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, or TriG serialization formats
  • Compressed files and archives (ZIP, GZIP)
  • MarkLogic archive, which includes both content and metadata (e.g., permissions and properties)
  • Delimited text (e.g., CSV) (import only)
  • Temporal Documents
  • Hadoop SequenceFiles

Getting Started with MLCP

You may find this free online training course helpful.

To get started moving data with mlcp, download and unpack the binaries. For those interested in hacking or look at the internals, you can also download the Apache 2.0 licensed source.

To create your first import script make sure you have an XDBC server attached to your database (running on port 8006, for example, below). From the command line, run the following, substituting your particulars.

156 questions
0
votes
2 answers

MarkLogic export fail due to time limit exceed

I export huge pdf files some of the pdf over 1GB and also reduce thread_count 4. What's else do I need to do to avoid timeout. Thanks ERROR contentpump.DatabaseContentReader: RuntimeException reading /pdf/docIns/docIns- 222581.pdf…
thichxai
  • 1,073
  • 1
  • 7
  • 16
0
votes
0 answers

Automate Running Marklogic Javascript files and code using ML-Gradle task

I am trying to automate running Datafix scripts using ml-gradle task. Myscripts sits in Directory like c:/data/scripts/Release_123. Release_123 can also have sub directories and each sub directory will have data fix scripts(SJS file) that needed to…
0
votes
0 answers

How may I specify layout or format metadata's xml after an export action on MLCP MarkLogic

I have an internal Integration Test and I need to validate metadata file against an expected file, everythink works OK in my PC (Windows 10 Pro, marklogic contained into Docker, spanish language), but when executing in linux I have a slightly…
0
votes
1 answer

mlcp performs differently to different input directory paths

I am using mlcp v9.0.4 to load data into MarkLogic v9.0.9 and I am trying to figure out the following: If the csv file is not having data rows and has only the column names, the file never gets loaded. How can I overcome this and load the empty…
Bharadwaj
  • 93
  • 8
0
votes
1 answer

How do I load a csv file into MarkLogic as one JSON file and not individual files?

I have a CSV file with | delimiter. I want to load it into MarkLogic using mlcp and load all the data as one single JSON document instead of multiple JSONs (One JSON per row). CSV: Name | age | gender Steve | 30 | M Rogers | 28 | M JSON…
Mehul
  • 148
  • 9
0
votes
1 answer

MarkLogic Cluster - Add data in 1st host & update in 2nd host throws error

MarkLogic setup is as follows 3 hosts Data confniguration - 1 master forest on each host - 1 replica for each host on different host We have MarkLogic cluster (3 hosts) with failover) deployed on Azure VMs We are using MarkLogic ContentPump…
Manish Joisar
  • 1,256
  • 3
  • 23
  • 47
0
votes
1 answer

Passing a parameter to a JavaScript transformation in MarkLogic

I have a JavaScript transformation where I take a csv and load it into db using MLCP. Say that my function accepts content and context. I have 2 other parameters that I need to pass through MLCP so that I can use it in the transformation. Can I use…
Mehul
  • 148
  • 9
0
votes
2 answers

MarkLogic - Forest data folder & Azure Blob

Technical Stack MarkLogic 9.0 Cenos Linux Azure Blob Blobfuse To make sure we do not have to worry about data disk size for MarkLogic Forest, we have configured Azure Blob to one of folder in Linux machine, so we do not have to worry about disk…
Manish Joisar
  • 1,256
  • 3
  • 23
  • 47
0
votes
1 answer

MarkLogic Content Pump (MLCP) - Performance - Logging details

So we are using MLCP to ingest XML data (in Zip file) into MarkLogic. It is working as expected. When i looked at output on screen, i see something weird. For 1st 25%, it is taking 7 minutes and for rest 1 minute, Is it real or it is something to…
Manish Joisar
  • 1,256
  • 3
  • 23
  • 47
0
votes
1 answer

any way to call mlcp from java apps

I'm new to Marklogic and mlcp. I'm working on marklogin 9.0-8. I wnat to use mlcp to load content, but since some parameters may need to be dynamically built based on content, does anyone know if it is possible to call mlcp from java…
Helen
  • 171
  • 16
0
votes
2 answers

How to denormalize data in documents in MarkLogic?

I have a bunch of normalized documents that I've loaded using CSV files in MLCP. How can I use the primary key (say ID) and locate all the relating documents and merge them into one denormalized document? I also need to change some value in the…
Mehul
  • 148
  • 9
0
votes
1 answer

In MarkLogic, how to add custom document properties to all documents?

I'm loading JSON documents using mlcp from a CSV into my db. I want to add a property to all those files and later be able to search the documents based on the property value. How can I do that using transformations? Using…
Mehul
  • 148
  • 9
0
votes
1 answer

Ingestion Failed while in LOAD BALANCER MLCP, MARKLOGIC

I'm using mlcp in load balance, the setup is i have 8 nodes that is load balanced by one ip, mlcp connects to that ip. I kill one node during the ingestion, but mlcp stop and wait for the connection then some documents were not ingested, i did this…
Falcon Ryu
  • 475
  • 1
  • 6
  • 17
0
votes
1 answer

MLCP Import with custom transform module

Could not able to Import documents with custom transform module option. I am trying to Import through mlcp as a gradle task over SSL. When I try to Run task , it is building successfully but not importing any modules. Code: task DeployPatterns(type:…
mpuram
  • 149
  • 9
0
votes
1 answer

Marklogic 9 MLCP ingest from URL does not work

I am using Marklogic 9 and want to ingest data from a website (url), which delivers me a JSON string as result. I try this with MarkLogic Content Pump (MLCP) with the following statement: mlcp.sh import -mode local -host localhost -port 8000…
Erik hoeven
  • 1,442
  • 6
  • 26
  • 41