0

So we are using MLCP to ingest XML data (in Zip file) into MarkLogic.

It is working as expected.

When i looked at output on screen, i see something weird.

For 1st 25%, it is taking 7 minutes and for rest 1 minute, Is it real or it is something to do with logging it on later stage? BTW, after 0% - 25%, it take sometime for output available on screen.

I am missing anything while mlcp execution? OR it takes more time in beginning and then it executes faster?

enter image description here

Manish Joisar
  • 1,256
  • 3
  • 23
  • 47

1 Answers1

1

I prefer following settings while using mlcp:

  1. Disable Locking or locking = off
  2. Disable Journaling or journaling = off
  3. Adjust max_split_size and thread_count in mlcp options. I used 1000 and 12 respectively for optimal performance in my setup.

Hope this gives you some pointers.

Ranvir
  • 2,094
  • 2
  • 19
  • 26
  • Thanks for your reply. I am new to ML, so what happen when locking or Journaling is off? I will try with split & thread_count options – Manish Joisar Mar 21 '19 at 11:53
  • 1
    You may refer https://docs.marklogic.com/guide/admin/databases, search for journaling / locking – Ranvir Mar 21 '19 at 17:21
  • Thanks Ranvir, I read, very useful. About split size & thread_count numbers, it talks about I/O bound, Ingestion will be always I/O bound, so on what basis we can decide on ? In our case, it will be like >12M records (in around >1000 zip files) one time activity (in each environment and then in live in future). I mean if i go for as you mentioned above will improve performance here ? – Manish Joisar Mar 21 '19 at 20:26
  • After made journaling = off, progress % started coming properly and with split size & thread count set as you mentioned above also improved performance by around 25-30%. Thanks for your help – Manish Joisar Apr 04 '19 at 12:47