1

I want to have two faust agents listening the same kafka topic, but every agent uses its own filter before process the events, and their event sets doesn't intersect.

In the documentation we have an example: https://faust.readthedocs.io/en/latest/userguide/streams.html#id4

If two agents use streams subscribed to the same topic:

 topic = app.topic('orders')

 @app.agent(topic)
 async def processA(stream):
      async for value in stream:
          print(f'A: {value}')

 @app.agent(topic)
  async def processB(stream):
       async for value in stream:
           print(f'B: {value}')

The Conductor will forward every message received on the “orders” topic to both of the agents, increasing the reference count whenever it enters an agents stream.

The reference count decreases when the event is acknowledged, and when it reaches zero the consumer will consider that offset as “done” and can commit it.

And below for filters https://faust.readthedocs.io/en/latest/userguide/streams.html#id13:

@app.agent() async def process(stream):
    async for value in stream.filter(lambda: v > 1000).group_by(...):
        ...

I use some complicated filter but as a result divide the stream in two parts for two agents with completely different logic. (I don't use group_by)

If two agents are working together everything is OK. But if I stop them and restart each will process the stream from the beginning. Because every event was not acknowledged by one of the agents. If I acknowledge all events in every agent when if one of the agents will not be started the second will clean the topic. (If one is crushed and restarted the conductor will see three subscribers as it's waiting 20 min the crushed agent for response).

I just want to separate events on two parts. How can I do the appropriate synchronization in this case?

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
Роман Коптев
  • 1,555
  • 1
  • 13
  • 35
  • Both are using the same group id? What offset did you configure the consumer to start at? – OneCricketeer Feb 12 '20 at 23:58
  • @cricket_007 I've just started from the default doc's samples. – Роман Коптев Feb 13 '20 at 08:14
  • Okay, well. Using group by is not a filter, and what exactly are you needing to synchronize? – OneCricketeer Feb 13 '20 at 08:18
  • @cricket_007 I use the `filter` but I don't use `group_by` because I don't have one field to split the stream. The filter function has condition on three fields to split the stream on two non intersecting parts for two agents. – Роман Коптев Feb 13 '20 at 08:22
  • Okay, and what's the problem exactly? – OneCricketeer Feb 13 '20 at 08:34
  • @cricket_007 The events in the stream should go to the appropriate agent and should be deleted from the stream after that. And now If one agent filter out an event in the stream the event remains in the queue because it is unacknowledged by this agent as the agents are in the same group. – Роман Коптев Feb 13 '20 at 08:41
  • @cricket_007 Is it possible to assign group_id for the agent explicitly? I can also create inmemory channel and forward filtered events to it, than subsribe an agent to the filtered channel. But I think I can loose events in this case if the script crushes. And in this case the reply can't be send – Роман Коптев Feb 13 '20 at 10:54
  • 1) It's impossible to remove messages from a Kafka topic after reading them. That's not how Kafka works. – OneCricketeer Feb 13 '20 at 13:08
  • 2) You assign the group by assigning the ID https://faust.readthedocs.io/en/latest/userguide/settings.html#std:setting-id. And Kafka has at least once semantics by default, if you're doing a filter, chances are more likely you'll end up filtering the same events more than once – OneCricketeer Feb 13 '20 at 13:14
  • @cricket_007 2) But it's global settings. What can I do to use it on the agent level? Create several apps? – Роман Коптев Feb 13 '20 at 15:20
  • Yes, you would have to. Thus "microservices" – OneCricketeer Feb 13 '20 at 15:44
  • Should you use a `callable` as the first argument of group_by? @РоманКоптев – William Sep 25 '20 at 06:27

1 Answers1

0

the faust filtering has some bugs when it comes to acknowledging filtered out events. I suggest to not use the fault.filter() feature but a simple if...then...else statements style when consuming from the stream, similar to the below:

@app.agent(topic)
async def process(stream):
    async for event in stream:
        if event.amount >= 300.0:
            yield event
Ameida
  • 153
  • 1
  • 1
  • 9