I have two topics:
- 1 topic with event data (
EventData
, let's say 5 partitions) -- logs on this topic use CustomerID as a key. - 1 compact topic with enrichment data (
EnrichmentKVs
, let's say 3 partitions) -- logs on this topic use the same CustomerID as a key.
The goal is to keep EnrichmentKVs in a Faust table and when EventData logs are streamed in, they are enriched with the data from that table and published to a new stream/topic.
So I have two Faust (python) applications, each with its own number of instances running:
- App1 (N-instances running) publishes to the EventData topic with the key=CustomerId
- App2 (M-instances running) does the following:
- update the faust table (
EnrichmentKVsTable
) for values from EnrichmentKVs topic - stream-in from EventData topic and "join" the data from the Faust table with the data streaming from
Eventdata
- update the faust table (
My understanding is that every instance of App2 will only have a partial Table of EnrichmentKVs based on the partitioning key. For the "JOIN" to work, any logs for EventData(key="1234")
must go to the same App2 instance as the logs for EnrichmentKVsTable(key="1234")
How can Faust ensure this when the partitioning of the two input topics is different, and the number of instances of each application might also be different? Or am I approaching this problem wrong?