0

I have a question about the way of publishing and reading messages in kafka for microservices arquitectures with multiple instance of the same microservices for writing and reading. My main problem here is that the microservices that publish and read are configure with an autoscaling but a default numer of instances of 1.

The point is that I have an entity, let call it "Event" that are stored in the DDBB and each entity has its own ID in the DDBB. When some specific command are executed in a specific entity (let say with entityID = ajsha87) it must be published a message that will be readed by a consumer. if each of this messages for the same entity is writen in diferent partitions and cosumed at the same time (Concurrency issue) I will have a lot of problems.

My question is about if according to the entityID for example I can set in which partitions all events of this specific entity will be published. For another entity with different ID I dont care about the partion but the messages for the same entity must be always published in the same partition to avoid that a consumer will read a messages (2) published after a message (1). There is any mechanism to do that, or each time I save the entity I have randomly store in the DDBB the partition ID in which its messages will be published?

Same happens with consumers. Only one consumer can read a partition at the same time because if not, a consumer number 1 can read the message (1) from partition (1) realted with entity (ID=78198) and then another can read the message (2) from partition (1) ralated with the same entity and process the message 2 before number one.

There is any mechanish about subscribe each instance only to one partition according to the microservice autoscaling?

Another option it will be to assign dinamically for each new publisher instance a partition, but I dont know how to configure that dinamically to set diferent particions IDs according to the microservice instance

I am using spring boot by the way

Thanks for you answer and recomendations and sorry if my english is not good enough.

1 Answers1

2

If you use Hash Partitioner as the partitioner in producer config (This is the default partitioner in many libraries), and use same key for same entity (let say with entityID = ajsha87) kafka manages to send all messages with same key to same partition.

If you are using group consumer, One consumer instance take the responsibility of one partition and all messages published to that partition consumes by that instance only. Instance can be changed if there is rebalancing when upscaling. but still messages in same partition will read from one consumer instance.

nipuna
  • 3,697
  • 11
  • 24
  • I understood the part about the consumer, it is simple as I can see, just setting the same group-id to all consumers this assures me that only one instance read an specific partition. but donnt still understand the part of producing message, because I dont know how kafka may remeber that because the first message publish of entity ajsha87 was publish in partition one, all messages with this hash must always go to partition one. This also works when creating more entities with different hash? – Gabriel García Garrido May 06 '21 at 16:07
  • _If the key is provided, the partitioner will hash the key with murmur2 algorithm and divide it by the number of partitions. The result is that the same key is always assigned to the same partition. If a key is not provided, behaviour is Confluent Platform version-dependent_ If you using this HashBased partitioner, every messages with same key goes to same partition. Please Refer [https://docs.confluent.io/platform/current/clients/producer.html#concepts](https://docs.confluent.io/platform/current/clients/producer.html#concepts) for more information. – nipuna May 06 '21 at 17:14
  • ok, Thank you so much for clarify my doubts! – Gabriel García Garrido May 07 '21 at 08:04