0

We have a requirement in which, we need to purposefully unbalanced the Kafka cluster by assigning the traffic distribution for a 3 broker cluster like 80 % to Broker 1 15 % to Broker 2 5 % to Broker 3 and send the messages for the topics to the brokers according to the broker traffic distribution.

To implement this logic in python programming using kafka-python, we are calling the produce unbalanced message function from within the main function. The sample from the code implementing the logic is provided below:-

Main Function

def mf():
   .
   .
   . 
    # create a topic if the topic doesn't exists. Tps_crtn will create new topic if no existing topics found else, will send messages to the existing topics, as usual.
    tpc_list = tps_crtn(base_topic_name=bt, no_of_topics=int(ntp), 
                    topic_partn=int(ptp), 
                    repicas_per_partn=int(rpp))
    #traffic distribution list
    dl = [80,15,5]
    while True:
        
      for ix, topic in enumerate (tpc_list):
        produce_unbalanced_message(topic_name=topic,
         no_of_msgs=int(round((int(nm) * (float(dl[ix])/100.0)))),
         max_wait_time=float(mwt)
if __name__ == "__main__":
    mf()

The main function calls the below-mentioned Producer send function in order to send messages to every topic in the topic list.

Unbalance Produce message function

def produce_unbalanced_message(topic_name='test-topic',
             no_of_msgs=-1,
             max_wait_time=2):
kafka_admin_client: KafkaAdminClient = KafkaAdminClient(
    bootstrap_servers='10.22.151.16:9100'
    )
. 
.
# List of all node ids in the cluster
LOG.info("Fetch the existing Kafka node list")
nodeids: List[int] = [node.nodeId for node in kafka_admin_client._client.cluster.brokers()]
for n in nodeids:
print(n)
.
.
.
# sending unbalanced messages to Kafka
producer.send(topic_name,
              key=key,
              value=message)
.
.

As per the requirement, the message should be sent according to the broker nos and corresponding traffic distribution list and not the topic list. The broker nos we are getting from the nodeids list in the produce_unbalanced_message functions.

However, on testing this code for a topic count of more than three by following the traffic distribution list parameter, we are getting- index out of bound error. The reason for this being in the topic list as soon as we increase their values, the traffic list distribution values are not matching as they are set according to the broker.

Can anyone please suggest what changes should be tried out such that messages are sent as per the broker nos obtained from nodeids list and corresponding traffic distribution list and not according to the topic list?

Bhuvi
  • 13
  • 3
  • You're going to want to define your own partitioner function. Otherwise, you cannot target a specific broker by its ID for records (keys go through a hash function, and you're not guaranteed hash-collisions wont happen here) – OneCricketeer Jul 12 '21 at 23:00
  • @ OneCricketeer Let say we define a partitioner function in which we get the node ids for the brokers, then within a for a loop when we traverse the node ids can we then call the static if block, as mentioned by you below, to send the messages accordingly? – Bhuvi Jul 14 '21 at 14:11
  • Broker IDs don't map to partitions, so I dont understand how that will help you. You need to describe topics, then find the leader partitions, and make sure there is no overlap, and ensure the broker doesn't rebalance the partitions while you're producing (or use only one replica) – OneCricketeer Jul 14 '21 at 14:54

1 Answers1

0

the traffic list distribution values are not matching as they are set according to the broker

You have your two lists coupled. If you will ever have more/less topics than the "distribution list", then you cannot use an index of one list to access the other.

IMO, it's more readable if you have a statically defined if-block because there's no real need to fetch the list of topics to create an unbalanced cluster.

And if you want a distribution of 100%, just use a random range

import random

while True:
   value = random.random()
   topic = None
   if 0 <= value < 0.80:
        topic = 't1'
   elif 0.80 <= value < 0.95:
        topic = 't2'
   else:
        topic = 't3' 
   
   print('Produce to topic ' + topic)

You will want to verify that the topics have only one replica and are hosted by different brokers, too, if you really want it to be "unbalanced"

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thanks for the suggestions. Just wanted to understand in the snippet mentioned above if we don't use the topic list then how will t1,t2 and t3 correspond to which topic we want to produce our data? – Bhuvi Jul 14 '21 at 14:08
  • Those strings are the topic names. – OneCricketeer Jul 14 '21 at 14:53