5

I am exploring different PubSub platforms and I was wondering what the limits are in Kafka for listening to multiple topics. Consider for instance this Use Case. We have trains, station entry gates, devices that all publish their telemetry. Currently this is done on a MQ but as data rates increase, smart trains etc. we need to move to a new PubSub/streaming platform and Kafka is on that list of course.

As I see it there are two strategies for aggregating this telemetry into a stream:

  1. aggregate on consumption, in which each train/device initially gets its own topic and topic aggregation is done using a regex-topic / virtual topic
  2. aggregate on production, in which all trains produces to an single topic and consumers use filters if neccessary to single out individual producers

As I understood Kafka is not particularly suited for high number of topics (>10.000), but it could be done. Would a regex-topic be able to aggregate 2000, 3000 topics?

Patrick Savalle
  • 4,068
  • 3
  • 22
  • 24
  • Just hit an issue with subscribing to more than 10000 topics inkafka. Would be nice to get some details on this. – inwenis Jul 15 '21 at 12:59
  • On the same kafka cluster we have multiple consumer-groups , where each have more than 400 topics matching the topic regex. We never experienced any problem. – raphaelauv Jul 18 '21 at 21:44
  • 1
    @raphaelauv I have one special consumer group subscribing to 10K+ topics and couldn't get it to work. For now we have split the consumer group into 2 each subscribing to ~5K topics – inwenis Jul 20 '21 at 11:17

1 Answers1

1

From the technical point view, it could be done; but in practice, this is not common. Why? Zookeeper. it is advised for cluster to have a maximum of 4000 partitions per brokers. This is partly due to the overhead of performing leader election for all of those on Zookeeper.

I recommend you to read these blog posts about this interesting topic on Confluent's blog:

marcosluis2186
  • 537
  • 5
  • 9