1

I'm writing a script to make millions of API calls in parallel.

I'm using Python 3.6 with aiohttp for this purpose. I was expecting that uvloop would make it faster, but it seems to have made it slower. Am I doing something wrong?

with uvloop: 22 seconds

without uvloop: 15 seconds

import asyncio
import aiohttp
import uvloop
import time
import logging

from aiohttp import ClientSession, TCPConnector

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger()

urls = ["http://www.yahoo.com","http://www.bbcnews.com","http://www.cnn.com","http://www.buzzfeed.com","http://www.walmart.com","http://www.emirates.com","http://www.kayak.com","http://www.expedia.com","http://www.apple.com","http://www.youtube.com"]
bigurls = 10 * urls

def run(enable_uvloop):
    try:
        if enable_uvloop:
            loop = uvloop.new_event_loop()
        else:
            loop = asyncio.new_event_loop()
        asyncio.set_event_loop(loop)
        start = time.time()
        conn = TCPConnector(limit=5000, use_dns_cache=True, loop=loop, verify_ssl=False)
        with ClientSession(connector=conn) as session:
            tasks = asyncio.gather(*[asyncio.ensure_future(do_request(url, session)) for url in bigurls]) # tasks to do
            results = loop.run_until_complete(tasks) # loop until done
            end = time.time()
            logger.debug('total time:')
            logger.debug(end - start)
            return results
        loop.close()
    except Exception as e:
        logger.error(e, exc_info=True)

async def do_request(url, session):
    """
    """
    try:
        async with session.get(url) as response:
            resp = await response.text()
            return resp
    except Exception as e:
        logger.error(e, exc_info=True)

run(True)
#run(False)
Benyamin Jafari
  • 27,880
  • 26
  • 135
  • 150
skunkwerk
  • 2,920
  • 2
  • 37
  • 55
  • 1
    Not sure exactly what is causing the problem, but a few things to try/fix: you don't need ensure_future around do_request, aiohttp's own connection limit can be slow or cause problems; put a semaphore around the request to limit the number of simultaneous connections and see if that helps. – SColvin Nov 12 '17 at 12:38
  • How many times did you measure? What was the stdev of each of these times? Why testing on servers far away - why not run it against a couple of servers on the LAN (identified by IP address) to confirm? Which one was run first (use_dns_cache=True)? – Tomasz Gandor Aug 07 '18 at 21:22

3 Answers3

1

aiohttp recommends to use aiodns

also, as i remember, this with ClientSession(connector=conn) as session: should be async

Belegnar
  • 721
  • 10
  • 24
1

You're not alone; I actually just got similar results (which led me to google my findings and brought me here).

My experiment involves running 500 concurrent GET requests to Google.com using aiohttp.

Here is the code for reference:

import asyncio, aiohttp, concurrent.futures
from datetime import datetime
import uvloop


class UVloopTester():
    def __init__(self):
        self.timeout = 20
        self.threads = 500
        self.totalTime = 0
        self.totalRequests = 0

    @staticmethod
    def timestamp():
        return f'[{datetime.now().strftime("%H:%M:%S")}]'

    async def getCheck(self):
        async with aiohttp.ClientSession() as session:
            response = await session.get('https://www.google.com', timeout=self.timeout)
            response.close()
        await session.close()
        return True

    async def testRun(self, id):
        now = datetime.now()
        try:
            if await self.getCheck():
                elapsed = (datetime.now() - now).total_seconds()
                print(f'{self.timestamp()} Request {id} TTC: {elapsed}')
                self.totalTime += elapsed
                self.totalRequests += 1
        except concurrent.futures._base.TimeoutError: print(f'{self.timestamp()} Request {id} timed out')

    async def main(self):
        await asyncio.gather(*[asyncio.ensure_future(self.testRun(x)) for x in range(self.threads)])

    def start(self):
        # comment these lines to toggle
        uvloop.install()
        asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

        loop = asyncio.get_event_loop()
        now = datetime.now()
        loop.run_until_complete(self.main())
        elapsed = (datetime.now() - now).total_seconds()
        print(f'{self.timestamp()} Main TTC: {elapsed}')
        print()
        print(f'{self.timestamp()} Average TTC per Request: {self.totalTime / self.totalRequests}')
        if len(asyncio.Task.all_tasks()) > 0:
            for task in asyncio.Task.all_tasks(): task.cancel()
            try: loop.run_until_complete(asyncio.gather(*asyncio.Task.all_tasks()))
            except asyncio.CancelledError: pass
        loop.close()


test = UVloopTester()
test.start()

I haven't planned out and executed any sort of careful experiment where I'm logging my findings and calculating standard deviations and p-values. But I have run this a (tiring) number of times and have come up with the following results.

Running without uvloop:

  • loop.run_until_complete(main()) takes about 10 seconds.
  • average time to complete for request takes about 4 seconds.

Running with uvloop:

  • loop.run_until_complete(main()) takes about 16 seconds.
  • average time to complete for request takes about 8.5 seconds.

I've shared this code with a friend of mine who is actually the one who suggested I try uvloop (since he gets a speed boost from it). Upon running it several times, his results confirm that he does in fact see an increase in speed from using uvloop (shorter time to complete for both main() and requests on average).

Our findings lead me to believe that the differences in our findings have to do with our setups: I'm using a Debian virtual machine with 8 GB RAM on a mid-tier laptop while he's using a native Linux desktop with a lot more 'muscle' under the hood.

My answer to your question is: No, I do not believe you are doing anything wrong because I am experiencing the same results and it does not appear that I am doing anything wrong although any constructive criticism is welcome and appreciated.

I wish I could be of more help; I hope my chiming in can be of some use.

0

I tried a similar experiment and see no real difference between uvloop and asyncio event loops for parallel http GET's:

asyncio event loop: avg=3.6285968542099 s. stdev=0.5583842811362075 s.
uvloop event loop: avg=3.419699764251709 s. stdev=0.13423859428541632 s.

It might be that the noticeable benefits of uvloop come into play when it is used in server code, i.e. for handling many incoming requests.

Code:

import time
from statistics import mean, stdev
import asyncio
import uvloop
import aiohttp

urls = [
    'https://aws.amazon.com', 'https://google.com', 'https://microsoft.com', 'https://www.oracle.com/index.html'
    'https://www.python.org', 'https://nodejs.org', 'https://angular.io', 'https://www.djangoproject.com',
    'https://reactjs.org', 'https://www.mongodb.com', 'https://reinvent.awsevents.com',
    'https://kafka.apache.org', 'https://github.com', 'https://slack.com', 'https://authy.com',
    'https://cnn.com', 'https://fox.com', 'https://nbc.com', 'https://www.aljazeera.com',
    'https://fly4.emirates.com', 'https://www.klm.com', 'https://www.china-airlines.com',
    'https://en.wikipedia.org/wiki/List_of_Unicode_characters', 'https://en.wikipedia.org/wiki/Windows-1252'
]

def timed(func):
    async def wrapper():
        start = time.time()
        await func()
        return time.time() - start
    return wrapper

@timed
async def main():
    conn = aiohttp.TCPConnector(use_dns_cache=False)
    async with aiohttp.ClientSession(connector=conn) as session:
        coroutines = [fetch(session, url) for url in urls]
        await asyncio.gather(*coroutines)

async def fetch(session, url):
    async with session.get(url) as resp:
        await resp.text()

asycio_results = [asyncio.run(main()) for i in range(10)]
print(f'asyncio event loop: avg={mean(asycio_results)} s. stdev={stdev(asycio_results)} s.')

# Change to uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

uvloop_results = [asyncio.run(main()) for i in range(10)]
print(f'uvloop event loop: avg={mean(uvloop_results)} s. stdev={stdev(uvloop_results)} s.')
Otto
  • 1,787
  • 1
  • 17
  • 25