1

I have two topics:

  • 1 topic with event data (EventData, let's say 5 partitions) -- logs on this topic use CustomerID as a key.
  • 1 compact topic with enrichment data (EnrichmentKVs, let's say 3 partitions) -- logs on this topic use the same CustomerID as a key.

The goal is to keep EnrichmentKVs in a Faust table and when EventData logs are streamed in, they are enriched with the data from that table and published to a new stream/topic.

So I have two Faust (python) applications, each with its own number of instances running:

  • App1 (N-instances running) publishes to the EventData topic with the key=CustomerId
  • App2 (M-instances running) does the following:
    • update the faust table (EnrichmentKVsTable) for values from EnrichmentKVs topic
    • stream-in from EventData topic and "join" the data from the Faust table with the data streaming from Eventdata

My understanding is that every instance of App2 will only have a partial Table of EnrichmentKVs based on the partitioning key. For the "JOIN" to work, any logs for EventData(key="1234") must go to the same App2 instance as the logs for EnrichmentKVsTable(key="1234")

How can Faust ensure this when the partitioning of the two input topics is different, and the number of instances of each application might also be different? Or am I approaching this problem wrong?

booleys1012
  • 671
  • 3
  • 9

1 Answers1

0

Update (Kstreams info, non-faust):

Things Learned:

  1. It looks like it IS a requirement that the JOINing topics must have the same number of partitions, and I'm assuming Faust has the same restriction. This explains my concerns above... except...
  2. ... it also looks like there are Global tables in KStreams and a correlated feature in Faust 1.9+ (linked).

Hopefully this is a breadcrumb for the next person to run across a similar question to mine above.

booleys1012
  • 671
  • 3
  • 9
  • Did you find answer? It looks like even in updated repo faust-streaming, joining is still not supported. – jth_92 Jan 31 '23 at 06:24