0

Imagine a scenario in which a producer is producing 100 messages per second, and we're working on a system that consuming messages ASAP matters a lot, even 5 seconds delay might result in a decision not to take care of that message anymore. also, the order of messages does not matter.

So I don't want to use a basic queue and a single pod listening on a single partition to consume messages, since in order to consume a message, the consumer needs to make multiple remote API calls and this might take time.

In such a scenario, I'm thinking of a single Kafka topic, with 100 partitions. and for each partition, I'm gonna have a separate machine (pod) listening for partitions 0 to 99.

Am I thinking right? this is my first project with Kafka. this seems a little weird to me.

behz4d
  • 1,819
  • 5
  • 35
  • 59

2 Answers2

1

Your bottleneck is your application processing the event, not Kafka. when you have ten consumers, there is overhead for connecting each consumer to Kafka so it will lower the performance. I advise focusing on your application performance rather than message broker.

Kafka p99 Latency is 5 ms with 200 MB/s load.

https://developer.confluent.io/learn/kafka-performance/

Arash
  • 320
  • 2
  • 10
  • I would those results with a bucket of salt :D - In those tests the latency is low because there are lots of messages flowing through (200k messages/s) and as a consequence producers don't need to wait `linger.ms` to send batches of messages to the brokers. – Augusto Oct 27 '22 at 10:16
  • Depending on the event size or conditions ofc. 100 message/s should not be a problem with almost any tech, and I personally won't use Kafka for this. – Arash Oct 27 '22 at 16:07
1

For your use case, think of partitions = max number of instances of the service consuming data. Don't create extra partitions if you'll have 8 instances. This will have a negative impact if consumers need to be rebalanced and probably won't give you any performace improvement. Also 100 messages/s is very, very little, you can make this work with almost any technology.

To get the maximum performance I would suggest:

And there a few producer and consumer properties that you'll need to change, but they depend your environment. For example batch.size, linger.ms, etc. I would also check about the need to set acks=all as it might be ok for you to lose data if a broker dies given that old data is of no use.

One warning: In Java, the standard kafka consumer is single threaded. This surprises many people and I'm not sure if the same is true for other platforms. So having 100s of partitions won't give any performance benefit with these consumers, and that's why it's important to use a Parallel Consumer.

One more warning: Kafka is a complex broker. It's trivial to start using it, but it's a very bumpy journey to use it correctly.

And a note: One of the benefits of Kafka is that it keeps the messages rather than delete them once they are consumed. If messages older than 5 seconds are useless for you, Kafka might be the wrong technology and using a more traditional broker might be easier (activeMQ, rabbitMQ or go to blazing fast ones like zeromq)

Augusto
  • 28,839
  • 5
  • 58
  • 88
  • I'm gonna have 100 consumers (different pods in different k8s nodes) each consuming from a different partition of a topic, so not signle-treaded anymore, right? and I know there are more easy-to-use brokers, I'm just trying to learn.. – behz4d Oct 27 '22 at 10:38