0

I have kafka cluster (MSK) in 3 availability zones and I have replication factor 3, so each zone will have one copy of my data. I have rack awareness that allow to read from nearest broker. Can I implement something the same for producers? I want to start to save my data directly from my az. I can recreate kubernates pod to run it in the same az with leader.

But each partition of one topic have leader in different az and my application will write to leaders in different zones.

For example: If I had all partition leader in one az I can run my service on this zone use only two data transfer copy among az (a->a->(b,c)). But in reality I have partition in different zones (a->b->(a,c)) and I got three data transfer among az.

Does anyone has idias how I can get leaders of topic partitions in one AZ? Or another variant is written only in partitions which leaders I have in the same az.

Does anyone can give an advice about that?

klynxe
  • 65
  • 1
  • 8
  • Do you really want to have all the leaders in one AZ? What if that AZ goes down? You potentially lost messages across ALL topics if not produced with enough in-sync replicas. If you produce with more than 1 in-sync replica.... then you're writing to multiple AZs – Ftisiot Aug 12 '22 at 14:40
  • I don't really want have al apps in one zone, I really want to reduce data transfer – klynxe Aug 12 '22 at 15:23
  • @klynxe were you able to find any solution? I'm looking for a similar option to save any cross az transfer cost that I can. Thanks! – Jeesmon Aug 22 '23 at 19:30

1 Answers1

0

Producers hash keys into specific partitions and can only produce to the leader partition of any topic located on a specific broker, last I checked. (Compared to consumers, which can fetch from replicas). There can only be one leader partition at a time.

So no, it's not possible for producers to write to "closer" replica; it needs to be only one of them that is the leader.

If an AZ goes offline, then Kafka will elect a new partition leader, and your producer may attempt to retry, if you've configured it to.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • So I can imagine only one variant how I can remove data transfer here. I need my partition selector that will select only among partitions that have leader in current az (I should haw many partitions to have at least one of them with leader for topic) – klynxe Aug 12 '22 at 19:44
  • 1
    There is no `rack` config for producers, so to implement this, you'd have to write your own `partitioner.class`. Also, you're not really reducing transfer if you have `replication.factor > 1` and the partition replicas are in other AZs. In other words, even if your producer did write to a local AZ, the data would still be sent to another one, regardless – OneCricketeer Aug 12 '22 at 22:09
  • It isn't true. For example: 1) producer in A, partition leader in A (replic in B). If I write in A I will have only one copy (A->B). 2) node in A, partition leader in B (with copy in A). If I write in B I have data transfer copy A->(to leader)B->(to replic)A – klynxe Aug 14 '22 at 13:14
  • You said you had `replication.factor=3`, so you're forgetting `->C` in both cases. But assuming it was only 2, then yes – OneCricketeer Aug 15 '22 at 14:52
  • In case of three we have less effect, but that same sence 1) A->A->(B,C) total 2 copies among zones 2) B->A->(B,C) total 3 copies. So 33% reducing data transfer – klynxe Aug 15 '22 at 20:13