Kafka balancing load between multiple tenants

Question

I'm considering Kafka as one of several technologies to serve as a message broker for worker nodes that will eventually send push notifications to users. An important constraint is that I don't want one tenant to monopolize resources such that it inserts a million notification messages and prevents other tenants from receiving their notifications in a reasonable time. In other words I want to each tenant to have their messages processed at about the same rate. My options seem to be either create a topic for each tenant or a partition for each tenant. Both seem problematic and/or frowned upon.

Creating a topic for each tenant seems like a logistical nightmare. Every time a new tenant gets added to the application the consumers would somehow have to be notified to subscribe to the topic.

Creating a partition for each tenant doesn't seem quite as bad but seems like it is frowned upon. However, based on my understanding of how load is distributed between partitions and consumers, if multiple tenants shared the same partition there is a possibility that one tenants messages will get stuck behind another's which is not how I want to balance the load.

What is my best option? Is there a third possibility I'm not considering? Is Kafka not the right tool for the job?

Thanks!

I wouldn't say so. This is a situation where you have a SaaS application with distinct sets of users that can't communicate with one another and shouldn't interfere with one another. However, their data would either be going into the same topic or on different partitions or each being given a topic. — Alex Denton, May 01 '19 at 20:35
Well, if no two tenants are allowed to read each-others data, then you would definitely need separate topics, not just partitions. If you don't want a flood of events to slow down absolutely nobody else, then you'd need isolated clusters... — OneCricketeer, May 01 '19 at 20:41
Since this is being used strictly for internal orchestration it doesn't seem unreasonable to let the consumers handle the logic of dealing with the tenants. My primary concern is with load distribution. Having to create a new topic for each tenant seems wrong when we're talking about thousands of tenants at a time and more are being created every day. Registering consumers with new topics also seems over complicated. That could definitely be the accepted best practice I'm just speaking from intuition. Are there any references you can point me to? — Alex Denton, May 01 '19 at 20:53
Sorry, I do not. I don't operate clusters on the order of more than a few hundred developers using the cluster. — OneCricketeer, May 01 '19 at 21:34
Do you have a reasonable solution now @AlexDenton ? I meet the same exact problem — addlistener, Dec 09 '19 at 14:50

score 2 · Answer 1 · answered May 01 '19 at 22:12

2

If you let multiple "tenants" share a partition, your fear of one tenant hijacking a partition might come true. In that case, you may have no choice other than to create topic per tenant. How could you address the administration?

You could set auto.create.topics.enable to true so that a tenant could create a topic just by sending message to it.
Registering dynamically created topics to consumers are not complicated if your topic names follow a pattern. Your consumers should subscribe to topics which matches the given pattern.

public void subscribe(java.util.regex.Pattern pattern)
Subscribe to all topics matching specified pattern to get dynamically assigned 
partitions. The pattern matching will be done periodically against topics 
existing at the time of check.

How quick the consumers can detect new topics is configurable using metadata.max.age.ms (default is 5 minutes)

If you are going to create thousands of topics, you might want to check the performance though (see)

answered May 01 '19 at 22:12

senseiwu

5,001
5
26
47

The problem comes in though that you would have all automatically created topics with the same number of partitions, so that property must be set according to hypothetical future load, and that defaullt cannot be changed without restarting the cluster, I think – OneCricketeer May 01 '19 at 23:27
if he creates topics per tenant, he could probably live with default number of partitions which is already set in `server.properties` which is 1 - provided all his messages will fit into one broker. If he wants to change it there, yes a restart will be necessary. – senseiwu May 02 '19 at 03:31
That's definitely good to know about `auto.create.topics.enable`. It still seems like it doesn't help from the consumer side. I'll have to build some sort of system to make sure consumers get registered to newly created topics. That's one of the things that makes me think one topic with a partition of reach tenant is the way to go. Obviously I don't know what I'm talking about though. Just going off my own observations and intuitions. – Alex Denton May 02 '19 at 18:09
as I mentioned, "registration" of consumers happen when they subscribe to a topic using a pattern. The only constraint you have is that your topics should follow some naming pattern. For ex. mytopic-1, mytopic-2 etc. – senseiwu May 02 '19 at 18:26
Ah, I totally missed that somehow. Sorry for not reading more carefully. That's a good tip. – Alex Denton May 03 '19 at 11:37

chendu · Answer 2 · 2022-06-30T05:51:44.687

One solution that i can think of is.. Assuming you are using AWS

[topic1] --> [kafka cosumer]  
                  -->
               [s3://bucket/tenant1]  --> Listener --> nonjava-Lambda
               [s3://bucket/tenant2]  --> Listener --> nonjava-Lambda
               [s3://bucket/tenant3]  --> Listener --> nonjava-Lambda

on s3 have folders tenant wise. Configure s3 listener on the tenant folder level
On the topic have a kafka consumer which dumps a list of tenant messages into the tenant folder (so assume some files with 1 msg; some with 100 msgs)

Since kafka is super fast (20k 800bytes-msgs/sec can be dequeued) all you have to do is implement the s3 listener lambda (in go/ python/ nodejs; not java) and get the work done.

You may say that on high load the overall throughput may decrease significantly as we are involving writing to s3 (which is on average of 300 msgs/sec) ; But remember that you are writing in batches. Meaning by the time you complete the 1st write; and you have enough messages accumulated inthe topic which all go into 1 file in the next iteration of s3 write. So my wild guess the over all throughput may decrease but not worst-ly

Kafka balancing load between multiple tenants

2 Answers2