MLCP with Stream of Data

Question

Not sure whether this valid question or not...

Requrement - I am going to write an application which captures huge data from External REST endpoint, I want to use MLCP to store that stream of data coming from External REST Endpoint to MarkLogic.

is it possible using MLCP ?

Please give your solutions.

score 3 · Accepted Answer · answered Sep 07 '18 at 21:56

3

DMSDK (the Data Movement SDK) might help to meet your requirements:

http://docs.marklogic.com/guide/java/data-movement

answered Sep 07 '18 at 21:56

ehennum

7,295
13
9

1

MLCP is good for loading from the file system or another MarkLogic instance. DMSDK is the way to go for streaming from an external source. – Dave Cassel Sep 08 '18 at 00:14
Right. What Mr. Cassel said. – ehennum Sep 08 '18 at 20:46

score 2 · Answer 2 · edited Sep 08 '18 at 00:52

If by "stream" you mean unbounded in space and time, and by "huge" you mean multi GB+, then no MLCP is not the right choice, or is not sufficient. MLCP is a command line 'batch' program, you need to have all your data already stored locally before starting it, its not 'streaming' in this sense.

In any case you will need to split up your data before sending to MarkLogic -- ideally chunks (documents) < 100MB (not a magic number, just a good upper bound). So your streaming code needs to read data, buffer it, split it into 'chunks' then send to ML. Once in 'chunks' then any API to ML will work, including MLCP. There are performance and usability tradeoffs between the different APIs' -- I'll leave that for another discussion.

MLCP with Stream of Data

2 Answers2