MarkLogic Content Pump (MLCP) - Performance - Logging details

Question

So we are using MLCP to ingest XML data (in Zip file) into MarkLogic.

It is working as expected.

When i looked at output on screen, i see something weird.

For 1st 25%, it is taking 7 minutes and for rest 1 minute, Is it real or it is something to do with logging it on later stage? BTW, after 0% - 25%, it take sometime for output available on screen.

I am missing anything while mlcp execution? OR it takes more time in beginning and then it executes faster?

score 1 · Accepted Answer · answered Mar 21 '19 at 10:45

1

I prefer following settings while using mlcp:

Disable Locking or locking = off
Disable Journaling or journaling = off
Adjust max_split_size and thread_count in mlcp options. I used 1000 and 12 respectively for optimal performance in my setup.

Hope this gives you some pointers.

answered Mar 21 '19 at 10:45

Ranvir

2,094
2
19
26

Thanks for your reply. I am new to ML, so what happen when locking or Journaling is off? I will try with split & thread_count options – Manish Joisar Mar 21 '19 at 11:53
1

You may refer https://docs.marklogic.com/guide/admin/databases, search for journaling / locking – Ranvir Mar 21 '19 at 17:21
Thanks Ranvir, I read, very useful. About split size & thread_count numbers, it talks about I/O bound, Ingestion will be always I/O bound, so on what basis we can decide on ? In our case, it will be like >12M records (in around >1000 zip files) one time activity (in each environment and then in live in future). I mean if i go for as you mentioned above will improve performance here ? – Manish Joisar Mar 21 '19 at 20:26
After made journaling = off, progress % started coming properly and with split size & thread count set as you mentioned above also improved performance by around 25-30%. Thanks for your help – Manish Joisar Apr 04 '19 at 12:47

MarkLogic Content Pump (MLCP) - Performance - Logging details

1 Answers1