15

The Getting Started docs for aiohttp give the following client example:

async with aiohttp.ClientSession() as session:
    async with session.get('https://api.github.com/events') as resp:
        print(resp.status)
        print(await resp.text())

I'm having trouble understanding when the response.status will be available. My understanding is that the coroutines releases control at the await response.read() line. How can I possibly access status before waiting for the response to comeback?

robin.keunen
  • 153
  • 1
  • 1
  • 6

4 Answers4

12

Important distinction: await ... may release control of the context, for example if the awaited data is not avalible fast enough. The same goes for the async with ... statement. Therefore your code reaches the line print(resp.status) not until the resp is avalible.

For example the code:

import aiohttp
import asyncio
import urllib.parse
import datetime

async def get(session, url):
    print("[{:%M:%S.%f}] getting {} ...".format(datetime.datetime.now(), urllib.parse.urlsplit(url).hostname))
    async with session.get(url) as resp:
        print("[{:%M:%S.%f}] {}, status: {}".format(datetime.datetime.now(), urllib.parse.urlsplit(url).hostname, resp.status))
        doc = await resp.text()
        print("[{:%M:%S.%f}] {}, len: {}".format(datetime.datetime.now(), urllib.parse.urlsplit(url).hostname, len(doc)))

async def main():
    session = aiohttp.ClientSession()

    url = "http://demo.borland.com/Testsite/stadyn_largepagewithimages.html"
    f1 = asyncio.ensure_future(get(session, url))
    print("[{:%M:%S.%f}] added {} to event loop".format(datetime.datetime.now(), urllib.parse.urlsplit(url).hostname))

    url = "https://stackoverflow.com/questions/46445019/aiohttp-when-is-the-response-status-available"
    f2 = asyncio.ensure_future(get(session, url))
    print("[{:%M:%S.%f}] added {} to event loop".format(datetime.datetime.now(), urllib.parse.urlsplit(url).hostname))

    url = "https://api.github.com/events"
    f3 = asyncio.ensure_future(get(session, url))
    print("[{:%M:%S.%f}] added {} to event loop".format(datetime.datetime.now(), urllib.parse.urlsplit(url).hostname))

    await f1
    await f2
    await f3

    session.close()

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

can produce this result:

[16:42.415481] added demo.borland.com to event loop
[16:42.415481] added stackoverflow.com to event loop
[16:42.415481] added api.github.com to event loop
[16:42.415481] getting demo.borland.com ...
[16:42.422481] getting stackoverflow.com ...
[16:42.682496] getting api.github.com ...
[16:43.002515] demo.borland.com, status: 200
[16:43.510544] stackoverflow.com, status: 200
[16:43.759558] stackoverflow.com, len: 110650
[16:43.883565] demo.borland.com, len: 239012
[16:44.089577] api.github.com, status: 200
[16:44.318590] api.github.com, len: 43055

Clarification (thx @deceze): Here you can see (look at the times between the brackets) all coroutines releasing control after sending a request to retrieve the website and a second time while awaiting the text of the response. Also borland, in contrast to stackoverflow, has so much text (other network characteristics excluded) that it's only ready to be displayed after the text from stackoverflow was printed, despite being requested earlier.

user3608078
  • 316
  • 2
  • 9
  • *"borland, in contrast to stackoverflow and github releases a second time"* - No, they all "release a second time"; the timing with downloading and reading the body, depending on the individual server speeds, network conditions and scheduling within the event loop, simply work out to the response order you happen to see. – deceze Sep 28 '17 at 08:03
  • 1
    Thank you. So if I understand correctly, `async with` waits for the `resp` coroutine before entering the block. Just like `await` waits for `resp.text()` - another coroutine in the response - before proceeding to following statements. – robin.keunen Sep 28 '17 at 15:58
  • Well sort of, `resp` is a variable, `session.get()` is the coroutine. But your idea is correct, the line waits till it can assign an value to `resp`. – user3608078 Sep 29 '17 at 15:58
  • Then why do we need to use the await keyword in `await resp.text()`? If the response object is fully available inside the `async with` why not just call `resp.text()` without the await ? – DollarAkshay Sep 30 '22 at 10:07
4

You first get the HTTP response headers, which include in the first line the status code. If you so choose you can then read the rest of the response body (here with resp.text()). Since the headers are always relatively small and the body may be very large, aiohttp gives you the chance to read both separately.

deceze
  • 510,633
  • 85
  • 743
  • 889
2

resp object is available inside async with block. Therefore resp.status is available too. Also you can call await on some methods, like resp.text() but is doesn't release control of async with block. You can work with resp even after await has been called.

Alex Pshenko
  • 167
  • 8
2

I was very confused by this as well, and didn't find the answers above cleared it up enough for me. So, I dug into the documentation and found this page (https://docs.aiohttp.org/en/latest/http_request_lifecycle.html) that makes it clear that the "async with" does the equivalent of an await for the get, and then further clarifies that: "aiohttp loads only the headers when .get() is executed, letting you decide to pay the cost of loading the body afterward, in a second asynchronous operation".

To summarize, the async with session.get does an await to get the response headers, and that's why you can access the status right after the get. Response bodies can be big and that's why there's a separate await for processing the body.

I am now at peace with the example code :-)

rborchert
  • 146
  • 1
  • 6