0

I have this method that gets the og:image meta tag of html pages asynchronously, I'm using httpx to stream the response so that i can stop reading when i encounter the og:image tag. My problem is that I'm getting a serious memory leak that crashes the whole app after 40+ requests. This is the code sample:

async def getImg(url):
   og_line = None
   async with httpx.AsyncClient() as client:
        async with client.stream('GET', url) as response:
            async for chunk in response.aiter_lines():
                if "og:image" in chunk:
                    og_line = chunk
                    break

My question is am I doing something horrible here by breaking in an async for and this is expected (if yes would love to know how i can do it differently), or this is unexpected behavior? Thanks.

Flo
  • 936
  • 1
  • 8
  • 19
KiKoS
  • 428
  • 1
  • 7
  • 18
  • 1
    It's expected - the `async with` context manager should handle properly closing the response regardless of whether you have read it through. The question is how have you come to the conclusion that the leak is connected to `break`ing out of the loop, or even to this code at all? Does the leak disappear if you comment out the `break`? – user4815162342 Jun 28 '21 at 13:21
  • Yes as soon as i comment it out, there is no memory leak. Here is two memory summary (done with pympler) for each call: [with break](https://pastebin.com/bidg8whJ) [without break](https://pastebin.com/iRMVyNFR) – KiKoS Jun 28 '21 at 13:56
  • Also i forgot to add that my application is using FastAPI, it's basically and endpoint that all it does is call that function and return the result. – KiKoS Jun 28 '21 at 14:01
  • Can you please provide a [mcve]? There are [some](https://github.com/encode/httpx/issues/978) [reports](https://stackoverflow.com/questions/66534027/python-requests-httpx-repsonses-and-garbage-collection) of httpx memory leaks, but it's hard to say if that matches your case. – MisterMiyagi Jun 28 '21 at 14:07
  • 2
    Just to correct myself: by "it's expected", I meant "it's expected to work correctly", not "it's expected to leak" (as I've now realized the question implies). If you can construct a minimal reproducible example, you should definitely report it to the httpx maintainers. – user4815162342 Jun 28 '21 at 14:10
  • 1
    I created a [minimal reproducible example](https://github.com/KiKoS0/httpx-mem-leak) and also found out that it only happens with specific websites I'm currently working on. All you have to do to try it out is to switch between `normal-links.txt` and `leaking-links.txt` and also comment out the `break`. – KiKoS Jun 28 '21 at 15:16

1 Answers1

-1

You should be able to do something like this. Use client.get instead of client.stream. You don't need to deal with this as a stream. You can await on client.get within the AsyncClient. Basically you are saying, when this API call has returned, give me what the API returned as response

Once you aren't dealing with coroutines, the logic is a lot easier. It is still being handled properly by the context manager.

    async with httpx.AsyncClient() as client:
        response = await client.get(url)
        response.raise_for_status()
        for chunk in response.iter_lines():
            if "og:image" in chunk:
                og_line = chunk
             
        return final_result_data
        ....
David Rinck
  • 6,637
  • 4
  • 45
  • 60
  • But if i disable response streaming, I'll be downloading the whole page which after trying it out takes way more time to give a result. – KiKoS Jun 28 '21 at 15:21