How to profile a method with arguments in a loop?

Question

I have a class Foo and it implements a method compute. I would like to see how long compute takes and where it spends most of its time when executing. To resolve the first problem, I used timeit.default_timer in a loop and averaged:

import numpy as np
from timeit import default_timer as Timer

foo = Foo()
n = int(10e8)
times = np.empty(n)

for i in range(n):
    start = Timer()
    foo.compute(i)
    times[i] = Timer() - start

print(np.sum(times) / times.size)

This tells me what the average execution time of each compute call was, easy enough; except its slower than I thought.

Now I would like to break down the execution profile of compute, but the followingn doesn't seem to do the trick:

import cProfile

for i in range(n):
    cProfile.run(foo.compute(i))

I didn't think it would work anyhow, but the docs seem to suggest I would have to put this loop in a method and then profile that loop? So I tried that but it doesn't show whats going on inside compute which is what I desire. How can this be accomplished?

Do you need to get the `cProfile` output for every single iteration? — tania, Oct 09 '20 at 13:31
@tania I do not need to to output the stats each iteration. I would like to output the total stats after the loop is exhausted. `compute` has several functions inside of it, so I want to determine which is taking the longest in a "real world" scenario. — , Oct 09 '20 at 13:32
Is there a reason why you want to profile *inside* the loop, instead of profiling the entire loop? Profiling relies on having *several* samples of the same execution; making several profiles with just one profile each seems counter-productive. — MisterMiyagi, Oct 09 '20 at 13:42
@MisterMiyagi My understanding of a profile may be incorrect, but I wanted to profile the function being run inside the loop and not the loop itself; that is, I don't want to time how fast the execution of each iteration is, but rather what was executed during each iteration. If each iteration took say 10s, I want to know what portions of that iteration made up the 10s so I can optimize. — , Oct 09 '20 at 13:54
@madeslurpy A profile records *everything* that happens in it. So a profile of the loop is not just the total loop, it is also the statistical data about everything inside it. Unless ``foo.compute`` behaves wildly different depending on input, profiling the loop and merely ignoring the outermost layer should be appropriate. — MisterMiyagi, Oct 09 '20 at 14:07

score 1 · Answer 1 · answered Oct 09 '20 at 13:36

Instead of using run(), you can use runctx(), which provides arguments to supply globals and locals dictionaries.

cProfile.runctx("foo.compute(n)", {"n": n, "foo": Foo()}, {})

To aggregate the results from multiple profile runs, see How to calculate the average result of several cProfile results?.

score 0 · Answer 2 · answered Oct 09 '20 at 14:03

To use cProfile, you need to pass the function as a string argument or "command". So, simply modifying your original code should work:

for i in range(n):
    cProfile.run("foo.compute(i)")

However, in your case, it seems like you want to profile the execution of a "real world" case and see where you have bottlenecks that you can optimize. Profiling the function is not the same as benchmarking it. The Python documentation on profilers also states:

Note The profiler modules are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is timeit for reasonably accurate results).

When benchmarking, you want to execute the command many times in a loop, in order to get the average execution time +- standard deviation, which should be representative of what happens in reality and allows you to compare different solutions. However, profiling breaks down a single run of the command. This should be enough to identify which specific components of the method take more time than others (vs. how long they actually take), and this order of magnitude shouldn't really change between runs (although the specific time it takes might fluctuate).

If you do actually want to calculate statistics on the average run, you could generate the stats over many runs and then extract the data to manipulate it further:

pr = cProfile.Profile()
pr.enable()
 
for i in range(n):
    foo.compute(i)

pr.disable()
pr.create_stats()
stats = pr.stats

(pr.stats is a dictionary that gets formatted into the print statement you see when you profile)

How to profile a method with arguments in a loop?

2 Answers2