0

I have a pipeline where a message is produced to topic A, which is processed by a stream processor and sends enriched data to topic B. Topic B is consumed by 3 other stream processors which independently perform a small part of the calculation (to reduce the load on a single processor) and forward their enriched data onto a new topic. The final processor reads from all 3 new topics and send this data on to web clients via web sockets. It all works well but if the system sits idle for 30 minutes or so with no new messages it can sometimes take up to 10 seconds to get to the end of the pipeline. When operating normally this time has been in the order of 10-20ms.

Every stream processor uses tables to refer to previous data and determine how to enrich going forward, so I'm not sure whether accessing this table slows if there's no need to access it over time?

If so, it seems a silly workaround, but it might be possible to use a timer to send a dummy dataset to trigger each worker to stay alive and alert.

Below is a print output of the time difference from the message initiation to the arrival time at the end of the pipeline:

[2022-05-23 08:52:46,445] [10340] [WARNING] 0:00:00.017999
[2022-05-23 08:53:03,469] [10340] [WARNING] 0:00:00.025995
[2022-05-23 09:09:46,774] [10340] [WARNING] 0:00:06.179146

I wonder whether using any of the settings available to either brokers or agents noted on this page will be of use here? If anyone knows, please let me know.

UPDATE

So I ran tests where i use the @app.time option to send a dummy/test message through the entire pipeline every second and never had an instance of slow send times. I also updated the way things work to directly talk to the app using the @app.page() decorator rather than a FastAPI endpoint to try send to the topic and this did mean I never saw a delay greater than 2 seconds. But the same thing did still happen where if it sat idle for a while then received a new message it took almost exactly 2 seconds (plus change) to do it's thing. This really starts to look like an agent throttles it's poll or kafka throttles an agent's connection if the throughput is low.

Fonty
  • 239
  • 2
  • 11
  • This might be, in part, solved by making requests directly to the faust application rather than to a FastAPI endpoint which then sends to a kafka topic using a faust agent. I think there's a bottleneck there somehow. But this other post of mine might be the solution - request direct (https://stackoverflow.com/questions/72623736/setting-up-cors-with-faust-webview) – Fonty Jun 15 '22 at 05:02
  • There's a little more detail in the github issue I raised (https://github.com/faust-streaming/faust/issues/306) including a cut down version of the code. – Fonty Jun 17 '22 at 01:25

1 Answers1

1

It appears that the issue stems from a setting on Kafka for both consumers and producers which basically closes the connection if they haven't sent/consumed messages within the designated time frame.

In Faust, you access this and set it up when you define the app like so:

app.conf.producer_connections_max_idle_ms
app.conf.consumer_connections_max_idle_ms

and set it to something appropriate. I understand that this setting is probably left low (9 mins by default) for large dynamic clusters to release resources or memory (or something) but in our use case with a small cluster that will remain static in terms or architecture and design, it's not an issue (I think) to increase this from 9 minutes to 12 or 24 hours.

Fonty
  • 239
  • 2
  • 11