Questions tagged [mlcp]

MarkLogic Content Pump is an open-source, Java-based command-line tool (mlcp). mlcp provides the fastest way to import, export, and copy data to or from MarkLogic databases. It is designed for integration and automation in existing workflows and scripts.

https://developer.marklogic.com/products/mlcp

User Guide

https://docs.marklogic.com/guide/mlcp

Features

Content Pump can:

  • Bulk load billions of local files
  • Split and load large, aggregate XML files or delimited text
  • Bulk load billions of triples or quads from RDF files
  • Archive and restore database contents across environments
  • Copy subsets of data between databases
  • Load documents from HDFS, including Hadoop SequenceFiles

Data sources and destinations

Content Pump supports moving data between a MarkLogic database and any of the following:

  • Local filesystem
  • HDFS
  • MarkLogic archive
  • Another MarkLogic database

Formats

Content Pump supports

  • XML, JSON, text, binary files
  • RDF encoded in RDF/XML, Turtle, RDF/JSON, N3, N-Triples, N-Quads, or TriG serialization formats
  • Compressed files and archives (ZIP, GZIP)
  • MarkLogic archive, which includes both content and metadata (e.g., permissions and properties)
  • Delimited text (e.g., CSV) (import only)
  • Temporal Documents
  • Hadoop SequenceFiles

Getting Started with MLCP

You may find this free online training course helpful.

To get started moving data with mlcp, download and unpack the binaries. For those interested in hacking or look at the internals, you can also download the Apache 2.0 licensed source.

To create your first import script make sure you have an XDBC server attached to your database (running on port 8006, for example, below). From the command line, run the following, substituting your particulars.

156 questions
1
vote
1 answer

Issue with uri while loading data in database

Can I use both command -generate_uri and -uri_id {any one element name} while inserting data into database through MLCP for getting unique uri? Or can I use multiple element name (-uri_id {first element name}, {second element name}) in MLCP command.…
Raj
  • 177
  • 10
1
vote
1 answer

Marklogic mlcp - option to delete input files from filesystem

Is there any option in MLCP to delete my input files after they are loaded successfully into ML database? I am running my MLCP scripts, NOT from the same server where my ML is running. Let me know if there are any Params to delete it. Recordloader…
Selva
  • 237
  • 1
  • 7
1
vote
0 answers

MLCP database to database copy. Collection names coming through with and without quotes

The source database has 6 unquoted collection names after an mlcp db-to-db copy the target database has replica collection names quoted and unquoted versions. When I restore the source db's backup onto the target db the latter has only the expected…
Guy Yeates
  • 85
  • 4
1
vote
1 answer

Marklogic: mlcp permission issue while importing

I am on Marklogic 8.0.4 mlcp. Following is the command I run: ./bin/mlcp.sh import -host localhost -username admin -password admin -input_file_path /file/path/to/RDF.owl -input_file_type RDF This is my log: 6/03/12 12:07:05 INFO…
Kunal
  • 2,929
  • 6
  • 21
  • 23
1
vote
1 answer

XML file upload using MLCP

We are trying to upload xml files(some of them are of 2GB) but they are not getting uploaded in database using MLCP. I created a new database and forest and new port . Made changes to mlcp.bat as below set OPTFILE="load_mlcp.txt" call…
1
vote
2 answers

Marklogic Content Pump generate multiple documents through XSLT transform

This is the second question related to MarkLogic content pump utility. I am ingesting a single aggregated XML document with multiple records into MarkLogic Content pump. I expect the the aggregate XML document to be transformed to a different…
vish.Net
  • 962
  • 2
  • 10
  • 21
1
vote
1 answer

Marklogic Content Pump and XSLT transformation

I am using MarkLogic Content Pump to ingest XML documents. I would like to transform these xml documents in the mlcp ingestion process using “-tranform module and -transform namespace” option. I have already created the XSLT for the transformation…
vish.Net
  • 962
  • 2
  • 10
  • 21
1
vote
1 answer

MLCP aggregated XML

I try to load aggregated XML files using MLCP into ML8. This is my data:
Thijs
  • 1,423
  • 15
  • 38
1
vote
1 answer

Using MarkLogic Content Pump to load triples Java

Is there any Java example or JavaDoc for MarkLogic Content Pump (MLCP)? I have MLCP dependencies added by Maven without any problem.
bsz
  • 321
  • 1
  • 9
0
votes
0 answers

can we use xdmp:document-insert in mlcp transformation query?

I am trying to load binary files(PDF) via mlcp and using transformation method to load few metadata xml for each pdfs load via mlcp. But getting below error - xmlns:error="http://marklogic.com/xdmp/error"…
anuj_gupta
  • 131
  • 4
0
votes
0 answers

Strange MLCP issue: ERROR c.m.contentpump.LocalJobRunner - Unable to query destination information, no usable hostname found

I am using the official MarkLogic Multi-Model Database: Enterprise Edition v.11 image found in Azure Marketplace to set up a CICD pipeline including the 3 node ml cluster infra deployment. That is step 1. After that infra deployment, it will run…
0
votes
1 answer

Fail to import large files size use MLCP utilities to MarkLogic database

I have a large pdf file size 1GB fail to load into MarkLogic. Is there the way for mlcp split the large file into small files, then merge back into single file pdf after loading into database? skipp record () in file:/data2022/ABO2022-129.pdf,…
thichxai
  • 1,073
  • 1
  • 7
  • 16
0
votes
0 answers

MarkLogic mlcp import not working - Unable to connect to localhost to query destination information

I am running mlcp import within a docker container . I have first installed the MarkLogic application and mlcp in a centOS7 machine and on top of the installed application I am trying to import a xml file using mlcp command. After the installation I…
0
votes
0 answers

getting not a usable net address error , in between a mlcp job run

we are running a mlcp job to redact data from one server to other and the whole process take 2 days time to complete but now after running to 10 hours, it is giving 'Default provider - Not a usable net address: outputhost:8000 ERROR…
ravvi
  • 117
  • 6
0
votes
1 answer

How to set up effective bidirectional document change sync between MarkLogic DB and file system?

MLCP could be used to export and import documents from file system to and from ML DB. However it is ineffective to import and export everything. Only delta changes could be synced. How to do that? The first question is how to detect a delta change…