19

I am reading both concepts. Mainly Kafka. And comparing with JMS to understand better.

Kafka guarantees ordered delivery and multiple subscriber. How does kafka achieve it?

Kafka has multiple partitions. If one consumer per partition, then we can guarantee ordering. We can achieve load balancing with multiple partitions. So Both at the same time is possible.

In case of JMS, if we have multiple queues, isn't same as Kafka?

Q1: Which is better in this scenario?

Q2: Am I looking narrowly? Does kafka do more than this?

Please advise me.

Even If I am wrong about JMS, please let me know.

Gibbs
  • 21,904
  • 13
  • 74
  • 138
  • Btw, there's also the option to use Apache Kafka on the server side and continue to use a JMS client on the, well, client side. Confluent provides such as JMS client for Kafka: http://docs.confluent.io/current/clients/kafka-jms-client/docs/index.html – miguno Mar 09 '17 at 10:20

2 Answers2

39

I was asking myself the same question before :)

As you wrote, Kafka guarantees ordered delivery only within a single partition. Period. If you are using multiple partitions (which is a must to have the parallelism), then it is possible that a consumer who listens on several partitions gets a message A from partition 1 before a message B from partition 2, even though message B arrived first.

Now, about the differences between Kafka and JMS. In JMS, you have a queue and you have a topic. With queues, when first consumer consumes a message, others cannot take it anymore. With topics, multiple consumers receive each message but it is much harder to scale. Consumer group from Kafka is a generalization of these two concepts - it allows scaling between members of the same consumer group, but it also allows broadcasting the same message between many different consumer groups.

Even more important difference is the following. Imagine that you have Kafka topic with 500 partitions and on the other hand, 500 JMS message queues. Let's also imagine that you have certain number of producers and consumers. In case of JMS, you need to configure each of them so they know which queues belong to them. What if e.g. some consumer crashes or you detect that you need to increase number of consumers? You have to reconfigure manually the whole system. This comes for free with Kafka, i.e. Kafka provides automatic rebalancing which is an extremely useful feature.

Finally, Kafka is tremendously faster, mostly because of some clever disk/memory transfer techniques and because consumers take care about the messages they consumed, not the broker like in JMS. Because of this, consumer is also able to "rewind", i.e. reread the messages from e.g. 2 days ago.

See also:

Bhesh Gurung
  • 50,430
  • 22
  • 93
  • 142
Miljen Mikic
  • 14,765
  • 8
  • 58
  • 66
  • 1
    You mentioned mostly the pros for Kafka :) I'd point out a function that JMS has while Kafka doesn't. This is the filtering. In JMS you can specify an SQL like select to select those messages what you need. Kafka doesn't have such functionality since you always get the message what your consumer points to. I'm not saying it's a huge difference but good to know. – Guigreg Dec 14 '18 at 16:21
  • @Guigreg Well, Kafka is thrilling :) Regarding the filtering, supposedly Apache Kafka 0.10.2 includes a feature in Kafka Connect called transformations that enables filtering, haven't tried it yet though. – Miljen Mikic Dec 14 '18 at 16:36
  • 1
    @Guigreg You could achieve the same functionality with Kafka Streams. See: https://docs.confluent.io/current/streams/concepts.html – zwessels Jul 01 '19 at 08:50
  • I think it needs to be mentioned however the underlying concept is same for both but Kafka uses HDFS which makes it highly scalable, durable and reliable. For example New York Times chose to put all of their newspaper since 1867 in Apache Kafka. – old-monk Jan 09 '21 at 02:07
  • 1
    @old-monk Kafka stores data in local files. Where did you get that Kafka uses HDFS? – Miljen Mikic Jan 11 '21 at 10:40
  • @Miljen Mikic thanks for correcting me. It do not use HDFS. However it is a distributed system. – old-monk Jan 11 '21 at 16:29
2

Here's a fairly good article on the differences: http://blog.hampisoftware.com/index.php/2016/01/20/apache-kafka-differences-from-jms/

Kafka does not guarantee message ordering across multiple partitions of a topic. Order is maintained only within a partition. In order to achieve strict ordering, you need to use one partition per topic.

abstax
  • 69
  • 4
  • 2
    That's not exactly true. You can guarantee ordering across multiple partitions, you only need to set SAME KEY for all the messages you need ordered. Those messages will go to the same partition so if you need ordering in ALL the messages then you will eventually use only one partition. But in most cases the messages of a topic are from different sources and you only need ordering based on the source. In those cases you can have many partitions and using the message KEY you will maintain order in the messages you need, but achieving parallel processing thanks to the partitions. – Aleja_Vigo Mar 08 '17 at 10:49