0

I have some high volume datastreams coming in on different websockets (sensor data - several TBs per month), for which

I want to guarantee that all data is stored, even during high load.

So I want to dispatch the data to my database and real-time processing module (e.g. GUI, ML predictions etc), in a way that

it buffers the datastreams, in case the processing of said modules is too slow, and so that these can 'catch-up' when the load decreases.

What I tried: python threads with queues (from Queue module or threading module) but if it's blocking, I can't ensure the data is not congested and if its non blocking (e.g. asyncio.Queue) I get race conditions and things blow up.

So maybe I should use some kind of callback methods but I don't know what to look for. I hope the question is not too vague. If anybody had a pointer to what I could try, optimally using python only, that would really help me a lot, even if its just an idea.

  • [Apache kafka](https://kafka.apache.org/) is a good streaming tool and you can use it in python by [faust](https://faust.readthedocs.io/en/latest/) library. – ahmadgh74 Feb 02 '21 at 06:26

1 Answers1

0

So maybe I should use some kind of callback methods but I don't know what to look for.

Looks like you need a Future.

What I tried: python threads with queues (from Queue module or threading module) but if it's blocking, I can't ensure the data is not congested and if its non blocking (e.g. asyncio.Queue) I get race conditions and things blow up.

You can try to use non-blocking methods of Queue:

and query database when queue.Empty or queue.Full exception caught.

madbird
  • 1,326
  • 7
  • 11