I don't expect trio
to run in any particular order. It is async, after all. But I noticed something strange and wanted to ask if anyone else could explain what might have happened:
- I wanted to test the rate of data ingestion from Google's Pub Sub if I send a small message one at a time. In order to focus on the I/O of pushing to Pub Sub, I sent messages async, and I use
trio
because, well, I want to keep my head from exploding. - I specifically wanted to look at how fast Pub Sub would be if I turned on it's ordering capability. I really just wanted to test throughput, and since I was using an async process, I didn't expect any ordering of messages, but I tagged the messages just out of curiosity.
- I noticed that the messages were processed in pub sub (and therefore sent to pub sub) at exactly the opposite order that is written in the imperative code.
Here is the important snippet (I can provide more if it is helpful):
async with open_nursery() as nursery:
for num in range(num_messages):
logger.info(f"===Creating data entry # {num}===")
raw_data = gen_sample(DATASET, fake_generators=GENERATOR) # you can ignore this, it is just a toy data generator. It is synchronous code, but _very_ fast.
raw_data["message number"] = num # <== This is the CRITICAL LINE, adding the message number so that I can observe the ordering.
data = dumps(raw_data).encode("utf-8")
nursery.start_soon(publish, publisher, topic_path, data, key)
and here is the publish
function:
async def publish(
publisher: PublisherClient, topic: str, data: bytes, ordering_key: str
):
future = publisher.publish(topic, data=data, ordering_key=ordering_key)
result = future.result()
logger.info(
f"Published {loads(data)} on {topic} with ordering key {ordering_key} "
f"Result: {result}"
)
And when I look at the logs in Pub/Sub, they are 100% consistently in reverse order, such that I see "message number"
50_000
first, then 49_999
, 49_998
, ..., 3
, 2
, 1
. Pub Sub is maintaining ordering. This means somehow, the async code above is "first" starting the very last task to reach nursery.start_soon
.
I'm not sure why that is. I don't understand exactly how Pub Sub's Future
works, because the documentation is sparse (at least what I found), so it is possible that the "problem" lies with Google's PublisherClient.publish()
method, or Google's result()
method that the returned future uses.
But it seems to me that it is actually due to the nursery.start_soon
. Any ideas why it would be exactly in the opposite order of how things are written imperatively?