Consuming thousands of topics vs thousands of partitions on 1 topic for Apache Kafka Consumer

Question

In a development environment, a person has a Java application, and they are seeing thousands of connections open. But they are listening on about 200 topics from a single consumer. This is spread over 3 brokers.

Locally I used Docker and a straight up Kafka Consumer with a Java app and created 300 partitions over 1 topic. At most it opened 2 connections.

What I also tested locally is subscribing to multiple topics (10) with about 3-300 partitions in each topic and noticed the same amount of TCP connections as subscribing to 1 topic over a few hundred partitions. My thinking is because it all shares the same broker on localhost is why the connection count was low.

My question is, would the number of connections to Kafka increase on a consumer if we had an application listening on a couple hundred topics from a single consumer?

I know that Kafka best practices is to use many partitions instead of many topics. I proposed we have a single topic with a couple thousand partitions, as this is Kafka best practices, and the official Kafka FAQ recommends more partitions instead of more topics. There was a StackOverflow answer that recommends this, too: Can I have 100s of thousands of topics in a Kafka Cluser?

What I'm trying to prove is why it's beneficial to create many partitions instead of creating many topics. So if anyone has input or real life production experience, that would be good too.

All read and write requests only happen with the leader broker, regardless of the number of partitions. I think that is what you are seeing. Topics can have different leaders. — OneCricketeer, Apr 05 '19 at 23:07
With regards to number of topics. Depends on the data... For example, you can't make differerent serializers for different partitions, and tossing in just random strings or byte blobs often yields hard to read parsing logic. If you just have "metrics" or "logs" topics, then those have some defined schema to them. But if you encode "payments", as a trivial example, without telling it what type of currency, then that translating logic needs to be applied somewhere before any reporting is done. Plus more partitions effects ordering, if that really matters for consumers. — OneCricketeer, Apr 05 '19 at 23:14
"Topics can have different leaders" but can partitions of the same topic have different leaders? — Aimee, Apr 19 '19 at 17:41

Consuming thousands of topics vs thousands of partitions on 1 topic for Apache Kafka Consumer

0 Answers0