can anyone please help. I have below requirement.
Requirement: Process non-duplicate, order chat-messages and make them a bundle per ProgramUserId, and here is the process and involved topics.
Data set up: ProgramUserId can have any number of messages but each message is unique and has a composite key: MsgId + Action. So imagine data in kafka like this below.
P2->M3+A1 , P2->M2+A1 , P2->M1+A1 , P1->M3+A1 , P1->M2+A2 , P1->M2+A1 , P1->M1+A1
I am doing this right now:
Initial-Topic: (Original key: ProgramUserId)
1)From Initial-Topic --> consume Kstream ( with re-keying to : Msg Id + Action ) --> then write to topic : dedup-Topic
- From dedup-Topic --> consume Kstream (with re-keying back to original key : ProgramUserId ) --> write to topic: Final-Topic
Since we are re-keying at the dedup-topic, the message's order will messup because rekeying results re-partinitong hence no gaurantee in the order.
I added below logic to achieve deduplication : From dedup-topic create Ktable and Postgres table(using Sink connect). For each incoming message, check key (Msg Id + Action) in both Ktable and PG table. If a record not found, that means it's not duplicated and write that record to dedup-topic.
But with the above message order is messing up due to rekeying /re-partitioning in dedup-Topic.
Please help how to achieve ordered msgs at this point?