What is correct way to use cProfile with asyncio code?

Question

I'm trying to determine how to correctly use cProfile and pstats with asyncio code. I am profiling my script by running cProfile.run('loop.run_until_complete(main())', 'profile.stats').

After using pstats to sort by SortKeys.TIME, I get the following output:

In [9]: sorted_stats.print_stats()
Fri Feb 15 14:40:18 2019    profile.txt

         67411199 function calls (67230882 primitive calls) in 305.899 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3617  273.162    0.076  273.162    0.076 {method 'poll' of 'select.epoll' objects}
14094092/14093931    1.527    0.000    1.904    0.000 {built-in method builtins.isinstance}
    44901    1.166    0.000    1.166    0.000 {built-in method posix.stat}
   784052    1.028    0.000    1.479    0.000 /usr/lib/python3/dist-packages/soupsieve/util.py:41(lower)
    32070    0.887    0.000    1.565    0.000 /usr/lib/python3/dist-packages/bs4/element.py:620(<genexpr>)
  4554545    0.729    0.000    0.756    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1331(descendants)
  1026560    0.726    0.000    3.358    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1502(search)
   448385    0.716    0.000    1.411    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1528(_matches)
      104    0.689    0.007    4.771    0.046 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:283(feed)
   255143    0.655    0.000    1.575    0.000 /usr/lib/python3/dist-packages/bs4/__init__.py:391(endData)
   137377    0.520    0.000    3.692    0.000 /usr/lib/python3/dist-packages/html5lib/_tokenizer.py:49(__iter__)
     8574    0.510    0.000    2.392    0.000 /usr/lib/python3.7/traceback.py:318(extract)
   167700    0.494    0.000    1.061    0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:199(charsUntil)
    53607    0.487    0.000    0.589    0.000 /usr/lib/python3/dist-packages/bs4/builder/__init__.py:158(_replace_cdata_list_attribute_values)
   299357    0.485    0.000    0.485    0.000 /usr/lib/python3/dist-packages/bs4/element.py:258(setup)
    22253    0.467    0.000    5.928    0.000 /usr/lib/python3/dist-packages/bs4/element.py:592(_find_all)
   448385    0.437    0.000    2.225    0.000 /usr/lib/python3/dist-packages/bs4/element.py:1461(search_tag)
    83147    0.424    0.000    2.365    0.000 /usr/lib/python3/dist-packages/bs4/builder/_lxml.py:176(start)
  1293015    0.418    0.000    0.556    0.000 /usr/lib/python3/dist-packages/soupsieve/css_match.py:94(is_tag)
  4824419    0.409    0.000    0.409    0.000 {method 'append' of 'list' objects}
    73068    0.404    0.000    0.634    0.000 /home/jess/.local/lib/python3.7/site-packages/rfc3986/normalizers.py:127(encode_component)
   868978    0.403    0.000    0.773    0.000 /usr/lib/python3/dist-packages/bleach/html5lib_shim.py:192(char)
   454702    0.385    0.000    0.385    0.000 {method 'match' of 're.Pattern' objects}
  2326981    0.374    0.000    0.415    0.000 {built-in method builtins.hasattr}
       52    0.334    0.006    5.732    0.110 /usr/lib/python3/dist-packages/html5lib/html5parser.py:196(mainLoop)


   ...       ...       ...      ...      ...          ...

As you can see, almost all of the execution time is showing up as {method 'poll' of 'select.epoll' objects}.

I don't have enough experience with cProfile or asyncio to know how to interpret this result. I can't tell if stats are showing my program spending the majority of execution time in poll() because (a) cProfile doesn't work well with asyncio and isn't giving correct stats, (b) this is normal behavior, and I should just ignore the first line or (c) this is an indicator of some sort of bug in my program (which is indeed terribly slow right now for reasons I have yet to determine) ...

Is it normal for async processes to spend this much execution time in poll()? Is there a better way to profile async code, either with cProfile or something else, that will give me more accurate results? Or is this a sign that there is likely some sort of bug in my code causing this?

NOTE: for context, the code I was profiling is basically fetching 50 pages, processing the HTML, and then doing some text/language processing. The fetch() part could explain a good portion of the time in poll(). But this on a fast internet connection (> 5MB /sec) so I don't think it should be spending that % of the time fetching just a few dozen text/html pages. It seems like something else is going on, or the stats aren't correct ...

It's entirely possible the code is spending most of its time polling—but impossible to tell without seeing the actual code. There's add-on called [line_profiler](https://pypi.org/project/line_profiler/) that's might be useful more useful to you than `cProfile` in this context (although I've never tried using it with `asyncio`). — martineau, Feb 16 '19 at 00:04
Have you tried running just one instance (say, download a single page) without `asyncio` and see how that performs? — suvayu, Feb 16 '19 at 19:32
@martineau the issue I'm having is that the application is several thousand lines of code, and the reason I'm trying to use the profiler in the first place is to find out where it is slow. But because the profiler is giving me the results above, I can't tell *where* in the program it is polling, and didn't feel like sharing the source for the entire application here was appropriate. What I'm asking is how to use the profiler to find out which part of my own code is polling, rather than a vague "it's in poll() somewhere" result that doesn't help me past "it's probably a function doing IO" — J. Taylor, Feb 16 '19 at 19:37
@suvayu the fetch function itself needs asyncio because it is using aiohttp.ClientSession.get() to fetch the pages, so I can't really test it without asyncio. However, the unit tests for fetch() show no problems for downloading individual pages. It's only when I am concurrently fetching/processing many pages that I run into issues. It seems like the problem isn't the speed for the individual fetches, but rather something elsewhere either when pages are being queued, processed, errors handled, etc ... I have no idea really, which is why I was trying to profile :( — J. Taylor, Feb 16 '19 at 19:42
J. Taylor: Using the linked line-profiler would allow you to find out where it's slow at the line-of-code level, which is why I mentioned it. You can apply the profiling selectively -- i.e. to just the part of the code you're interested in. — martineau, Feb 16 '19 at 20:25
Yes, I meant to thank you for sharing the line profiler. Unfortunately, I'm still in the same situation of not knowing which line to profile, so it won't help me here (short of randomly throwing it all over the place wherever there is IO and hoping that I find the right line) ... but it will be a useful tool in the future for sure! I feel like I need something like a "profiler for IO operations", that can tell me where specifically the poll() is happening ... — J. Taylor, Feb 17 '19 at 18:03
I found this link useful when trying to profile my own `asyncio` code: https://www.roguelynn.com/words/asyncio-profiling/ — erik, Jun 23 '21 at 15:05

score -3 · Answer 1 · answered Oct 19 '21 at 03:17

TL;DR use better tools

I found that it's more practical to profile something when it's reproducible, because then, any code change can be validated (if it improves performance or not).

If you can convert your code into a test case, using pytest, then please consider using https://pypi.org/project/pytest-profiling/, it's easy to set up, and the visual output is [relatively] easy to understand.

Another option is to use a statistical profiler, like https://pypi.org/project/pprofile/

Yet another, if you fancy profiling in production, is to use a service, like https://docs.datadoghq.com/tracing/profiler/

Finally, be clear about what you measure.

Most/many profiler allow to collect both wall clock time and cpu use time.

A naïve asyncio program is single-threaded, and given enough CPU share, is in one of two states at any given time: either crunching, or waiting. If your "problem" is too much crunching, then profiler is the answer, and, often, cpu time and wall time are roughly same. If the "problem" is actually waiting, then you probably want to look at "who is waiting for what". Shameless plug: use my https://pypi.org/project/awaitwhat/ for Python 3.7~3.9.

What is correct way to use cProfile with asyncio code?

1 Answers1