0

can anyone please help. I have below requirement.

Requirement: Process non-duplicate, order chat-messages and make them a bundle per ProgramUserId, and here is the process and involved topics.

Data set up: ProgramUserId can have any number of messages but each message is unique and has a composite key: MsgId + Action. So imagine data in kafka like this below.

P2->M3+A1 , P2->M2+A1 , P2->M1+A1 , P1->M3+A1 , P1->M2+A2 , P1->M2+A1 , P1->M1+A1

I am doing this right now:

Initial-Topic: (Original key: ProgramUserId)

1)From Initial-Topic --> consume Kstream ( with re-keying to : Msg Id + Action ) --> then write to topic : dedup-Topic

  1. From dedup-Topic --> consume Kstream (with re-keying back to original key : ProgramUserId ) --> write to topic: Final-Topic

Since we are re-keying at the dedup-topic, the message's order will messup because rekeying results re-partinitong hence no gaurantee in the order.

I added below logic to achieve deduplication : From dedup-topic create Ktable and Postgres table(using Sink connect). For each incoming message, check key (Msg Id + Action) in both Ktable and PG table. If a record not found, that means it's not duplicated and write that record to dedup-topic.

But with the above message order is messing up due to rekeying /re-partitioning in dedup-Topic.

Please help how to achieve ordered msgs at this point?

Santhi
  • 1
  • 1
  • Why you do `with re-keying to : Msg Id + Action`? It contains some business logic or transformation? – Paweł Szymczyk Feb 03 '22 at 17:43
  • sorry for the delay in responding. Each ProgramUserId can have multiple messages and to avoid duplicate msgs, I am rekeying. – Santhi Feb 08 '22 at 21:34

0 Answers0