MarkLogic version 9.0-6.1
We have implemented two patterns for batch ingestion.
Pattern 1 : MLCP
Pattern 2: Informatica(or NiFi) reading an NDJSON file and making MarkLogic REST API PUT calls for each JSON in the NDJSON file
Our production box is a 3 node cluster with 72 cores.
Our MLCP jobs run pretty well with default thread count of 4 and at the maximum we have 3 MLCP jobs runnning in parallel, ensuring that at least 60 cores are available for Real Time (or Near Real Time) processing at any point of time.
However, I am not sure how the Informatica/NiFi batch jobs use up the cores. Like MLCP, is there a way to limit the cores used by Informatica/NiFi jobs to ensure that sufficient cores/threads are available for Real Time processing?
As we add more and more processes to production, we see that there is a big increase in Time-out errors for Real-Time REST API PUT/GET calls. These calls typically take only few milliseconds(when we run them individually), so I am guessing that the contention for resources is causing the time-outs.
We have an option to scale-out nodes in the cluster, but this situation got me to think that MLCP is a better design than REST PUT calls for batch ingestion as we have better control over limiting the cores/threads used by each batch process, ensuring sufficient cores being available for Real-Time processing. Is there a way we can control/limit the resources used by NiFi, if used for batch ingestion?
Please suggest. Thanks in advance!