0

I recently started digging deeper in asynchronous code with Python, and am wondering why asyncio.sleep is so important.

Use Case

  • I have a synchronous source of data coming from a microphone every x milliseconds.
  • I check if there is a wakeword and if yes open a connection through a websockets to my server.
  • I send / receive message asynchronously and independently.

My ideal implementation is that as soon as a message is ready it is sent, and as soon as a message is received it is processed.

This must be efficient, since we want to go down to x = 20ms (frames from microphone received every 20 ms).

Implementation

The code is the following:

  • It has a consumer / producer approach: Consumer receives messages, Producer sends messages.
  • The frames from the microphone are put in a synchronous queue
  • The producer / consumer are handled in a different thread
  • The Queue is shared between the main thread and the other one. As soon as a new message is put, it will be processed on the other end.
import asyncio
import msgpack
import os
import pyaudio
import ssl
import websockets

from threading import Thread
from queue import Queue
from dotenv import load_dotenv

# some utilities
from src.utils.constants import CHANNELS, CHUNK, FORMAT, RATE
from .utils import websocket_data_packet

load_dotenv()

QUEUE_MAX_SIZE = 10
MY_URL = os.environ.get("WEBSOCKETS_URL")
ssl_context = ssl.SSLContext()


class MicrophoneStreamer(object):
    """This handles the microphone and yields chunks of data when they are ready."""

    chunk: int = CHUNK
    channels: int = CHANNELS
    format: int = FORMAT
    rate: int = RATE

    def __init__(self):
        self._pyaudio = pyaudio.PyAudio()
        self.is_stream_open: bool = True
        self.stream = self._pyaudio.open(
            format=self.format,
            channels=self.channels,
            rate=self.rate,
            input=True,
            frames_per_buffer=self.chunk,
        )

    def __iter__(self):
        while self.is_stream_open:
            yield self.stream.read(self.chunk)

    def close(self):
        self.is_stream_open = False
        self.stream.close()
        self._pyaudio.terminate()


async def consumer(websocket):
    async for message in websocket:
        print(f"Received message: {msgpack.unpackb(message)}")


async def producer(websocket, audio_queue):
    while True:
        print("Sending chunck")
        chunck = audio_queue.get()
        await websocket.send(msgpack.packb(websocket_data_packet(chunck)))
        # THE FOLLOWING LINE IS IMPORTANT
        await asyncio.sleep(0.02)


async def handler(audio_queue):
    websocket = await websockets.connect(MY_URL, ssl=ssl_context)
    async with websockets.connect(MY_URL, ssl=ssl_context) as websocket:
        print("Websocket opened")
        consumer_task = asyncio.create_task(consumer(websocket))
        producer_task = asyncio.create_task(producer(websocket, audio_queue))
        done, pending = await asyncio.wait(
            [consumer_task, producer_task],
            return_when=asyncio.FIRST_COMPLETED,
            timeout=60,
        )
        for task in pending:
            task.cancel()
            # TODO: is the following useful?
        await websocket.close()


def run(audio_queue: Queue):
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)

    loop.run_until_complete(handler(audio_queue))
    loop.close()


def main():
    audio_queue = Queue(maxsize=5)
    # the iterator is synchronous
    for i, chunk in enumerate(MicrophoneStreamer()):
        print("Iteration", i)
        # to simulate condition wakeword detected
        if i == 2:
            thread = Thread(
                target=run,
                args=(audio_queue,),
            )
            thread.start()
        # adds to queue
        if audio_queue.full():
            _ = audio_queue.get_nowait()
        audio_queue.put_nowait(chunk)


if __name__ == "__main__":
    main()

Issue

There is a line that I commented # THE FOLLOWING LINE IS IMPORTANT in the producer.

If I do not add asyncio.sleep(...) in the producer, the messages from the consumer are never received.

When I add asyncio.sleep(0) in the producer, the messages from the consumer are received, but very late and sporadically.

When I add asyncio.sleep(0.02) in the producer, the messages from the consumer are received on time.

Why is there this behavior and how to solve it? In order to send message every 20 milliseconds, I cannot sleep 20ms every iteration, this would probably mess up the process.

(Note, I found out this sleep fix with this issue)

What I tried

I thought that if the iterator was asynchronous, this would solve the issue, but it didn't. If you want to see the implementation, I opened another thread in the past days here.

I also tried to dig deeper into how event loops work. From my understanding, the asyncio.sleep is necessary for the event loop to decide which task to execute, and to switch between them - for instance, we use it to trigger a task to start, after creating it.

This seems a bit odd to me. Is there a workaround?

HGLR
  • 244
  • 2
  • 9

2 Answers2

1

This line is incorrect in async code: chunck = audio_queue.get() -> it will block until there is a value in the queue to be read, and while it is blocking no other async task is run - it should check if it can read something from the queue, and if not, release the code to the asyncio loop (at first, no need to wait 20ms, just an asyncio.sleep(0) should suffice to get things going)

from queue import Queue, Empty
...
async def producer(websocket, audio_queue):
    while True:
        print("Sending chunck")
        try: 
            chunck = audio_queue.get_nowait()
        except Empty:
            await asyncio.sleep(0)
            continue
        await websocket.send(msgpack.packb(websocket_data_packet(chunck)))
        # THE FOLLOWING LINE IS IMPORTANT
        await asyncio.sleep(0)

With this, you get more calls to the asyncio loop so other tasks can run, and it is likely you can use a value of "0" in the last line in the function as well.

What you have to keep in mind is that async programming implements collaborative concurrent execution, and code will just be executed outside of the current tasks in "spaces" where the current code explicitly pass control to the event loop. In your original implementation, the event loop would only be able to step-through any tasks scheduled by the websocket.send task when it would hit this asyncio.sleep line - otherwise, it would run up to the audio_queue.get() in the next iteration, and block everything - including any background I/O callbacks. By turning the get into non-blocking and inserting an extra await asyncio.sleep(0) (yes, it is the official way to pass the control to the async loop when you don't need to await anything), it will run the I/O in other tasks as it waits for something to show up in the threaded Queue.

jsbueno
  • 99,910
  • 10
  • 151
  • 209
  • 1
    Thank you very much @jsbueno. Nonetheless, the 0 works differently than the 0.02 in asyncio sleep. I posted another answer with a link to an excellent thread that explains why sleep(0) and sleep(nonzero) behave differently. It seems that calling 3 times asyncio.sleep(0) would also work.... The explanation there is really clear and interesting, definitely worth the read. – HGLR Aug 21 '23 at 18:38
  • 1
    Notice that the code as I modified it will end-up calling `asyncio.sleep(0)` more than 3 times between consecutive chunks - it should really work, and free you from arbitrary pauses your code can't have. – jsbueno Aug 21 '23 at 18:44
  • I think it is true if the `await websocket.send` is in the try after the `audio_queue.get_nowait()` (then we can discard the last `await asyncio.sleep(0)` ), but I think that might be a small typo :) – HGLR Aug 21 '23 at 18:50
  • 1
    By the way, thank you very much, this is the second time you answer one of my questions, really appreciate your time! – HGLR Aug 21 '23 at 18:51
0

For anyone wondering, you can go to this excellent discussion. I spent a (too) long time looking for it

HGLR
  • 244
  • 2
  • 9