Python asyncio vs ThreadPoolExecutor - inconsistent results for a purely I/O based task

Question

I've recently come across the problem where one needs to fetch a list of URLs as quickly as possible.

So naturally, I set up a small test to see what works best.

Approach 1 - asyncio

async def test_async():
    async with httpx.AsyncClient() as client:
        await asyncio.gather(*(fetch_async(client, symbol) for symbol in symbols))


async def fetch_async(client, symbol):
    await client.get(
        f"https://query1.finance.yahoo.com/v8/finance/chart/{symbol}.NS", timeout=None,
    )

Approach 2 - ThreadPoolExecutor

async def test_threads():
    with ThreadPoolExecutor(max_workers=len(symbols)) as pool, httpx.Client() as client:
        loop = asyncio.get_event_loop()

        await asyncio.gather(
            *(
                loop.run_in_executor(pool, fetch_sync_fn(client, symbol))
                for symbol in symbols
            )
        )


def fetch_sync_fn(client, symbol):
    def fn():
        client.get(
            f"https://query1.finance.yahoo.com/v8/finance/chart/{symbol}.NS",
            timeout=None,
        )

    return fn

Results on a 2013 MacBook pro

In [3]: %timeit asyncio.run(test_threads())                                                                                                                                                          
1.41 s ± 87.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [4]: %timeit asyncio.run(test_async())                                                                                                                                                            
1.24 s ± 62.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Results on a digital ocean 5$ server

In [4]: %timeit asyncio.run(test_threads())
5.94 s ± 66.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [3]: %timeit asyncio.run(test_async())
10.7 s ± 97.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Results on a Google colab

%timeit loop.run_until_complete(test_threads())
1 loop, best of 3: 723 ms per loop

%timeit loop.run_until_complete(test_async())
1 loop, best of 3: 597 ms per loop

Questions

What is the reason for this inconsistency? Why is there a different winner on the server vs local machine?
Why are both tests slower on a server? Shouldn't a pure network task be faster on a server that has a faster network connection?

Full code as github gist

If your Droplet has a shared CPU, you'll be contending with a lot more than your own process for CPU time. — dirn, Jul 14 '20 at 20:41
But this isn't a CPU bound task, why would CPU be a bottleneck in this case? — Dev Aggarwal, Jul 14 '20 at 20:47
Also, I would like to add that you are using a public url. This may be subject to other's traffic, which can alter the timing of responses. One last thing: by using `async with httpx.AsyncClient()` you connect just the first time and reuse the connection for the other requests, while with the thread you may be connecting at every request from scratch. — lsabi, Jul 14 '20 at 21:21
@lsabi Okay, changed the threaded version to re-use the client. It has made the threaded version faster, but the DO box tests still make my brain go haywire. — Dev Aggarwal, Jul 14 '20 at 22:14
Not sure what I can do about the public URL. Happy to use something else if it helps. — Dev Aggarwal, Jul 14 '20 at 22:16
I viewed htop while running the script, CPU usage peaks at around 40% — Dev Aggarwal, Jul 15 '20 at 00:24

Python asyncio vs ThreadPoolExecutor - inconsistent results for a purely I/O based task

0 Answers0