I am trying to evaluate marklogic for real time processing of the data. Earlier i have used kafka and storm for real time handling of data and after processing inserted to database. I am new to marklogic, so can anybody tell me is there anything available in marklogic which i can use for real time handling of data and after getting the data process it and then insert it into marklogic database.
3 Answers
MarkLogic is extremely scalable and has features like triggers, Alerting and CPF for which you can build your logic to decide what to do with incoming content. But a few notes to get you started:
MarkLogic is a share-nothing architecture, so CPU and HTTP servers on each node are independent, so you have to keep that in mind when you consider how to balance incoming messages.
MarkLogic also does not stream to disk.
MarkLogic can connect via a great HTTP client, but I do not believe there are any capabilities out-of-the-box to append content to an open connection (this is related to why it also has no ftp capability, I believe).
So, I point these items out so you understand that you are dealing with a different type of system. So the approach is just not the same. In fact, with the use of pre-commit triggers or just an http-based application mixed with super-fast features like reverse queries and designing your solution to match how MarkLogic works, handling huge amounts of data for real-time processing can be a great solution. There is one large implementation that I worked on for which MarkLogic is happily receiving and processing large volumes of messages form an upstream WebSphere message broker. Some messages are handled internally and others are passed on to Splunk and other systems.
I answered your question in a high-level way because it's not really asking a detailed question - and MarkLogic is a large, robust solution for which you really need to get an overview of on your own. If you have the time, there is a 1-day free training course that covers the fundamentals -which will allow you to better understand the product and assess it for you needs.
BTW: ALL training for MarkLogic is free. Here is the link to the fundamentals course: http://www.marklogic.com/training-courses/marklogic-fundamentals/ This one can also be take on your own time (self-paced)

- 2,540
- 3
- 24
- 39

- 7,560
- 12
- 20
Also, please take a look at the MarkLogic Java Client API which should be usable from within Storm or Kafka. Perhaps that offers you a way to continue doing the real-time processing you're used to then inserting the data into MarkLogic using the Java API.

- 2,465
- 10
- 15
-
Thanks Sam, i'll try it. – RCS Jun 10 '16 at 07:46
-
2I have one more question about which is better way to dump the data into marklogic either by using content pump with custom transformation or using java api in multi-threaded environment ? – RCS Jun 10 '16 at 09:53
-
Can Mark Logic Content Pump be used for real-time processing of streaming data and have it store it to ML DB? – Pradeep Kumar T R Jun 10 '16 at 10:02
-
Content Pump is for reading files off the file-system. If your flow can write to the file system, Content Pump can read it from there. If you want to stream data in a multi-threaded environment, use the Java API. – Sam Mefford Jun 16 '16 at 16:33
There is an open source Kafka Sink Connect for MarkLogic. Please take a look at https://github.com/sanjuthomas/kafka-connect-marklogic
You may be able to use Kafka as a buffer when you stream high-velocity data to MarkLogic. If MarkLogic's write throughput is acceptable, then you can transform/process the data during the ingestion time using a custom REST endpoint. I wouldn't consider the previous generation trigger and CPF based transformation as a scalable solution, more importantly, debugging a CPF pipeline issues is not something you wanted to do when you have other matured stream processing framework/tools available in the open source world.

- 181
- 2
- 10
-
I sincerely hope whoever downvoted this answer was out of pure technical reason and would be willing to explain. – Fan Li Sep 17 '18 at 21:19