Description
In my code to create N
random strings of length k
, a generator expression to create a random k
-character string gets called either N+1
or (N+1)*k
(aka N*k+N
) times. This appears to be happening with the exact same code and I can switch back and forth seemingly non-deterministically by redefining the chrs()
helper function. I suspect the profiler but don't have any clue as to the mechanism.
System details, in case they matter: CPython, Python 2.7.17 and Python 3.6.9 on Ubuntu 18.04 on HP Envy 15k (i7-4720)
Reproducing the Issue
Not a proper MCVE as I haven't been able to deterministically reproduce the issue; instead, running this will hopefully result in one of the two profiler outputs below. If you try running this in the REPL and redefining the chrs()
function repeatedly using the same code, you might be able to get both profiler outputs.
I wrote this bits_strings()
function when benchmarking various ways of generating N
(num_strings
) random strings of length k
(len_string
):
import cProfile
import random
import string
def chrs(bits, len_string):
return ''.join(chr((bits >> (8*x)) & 0xff) for x in range(len_string))
def bits_strings(num_strings, len_string):
return list(
chrs(random.getrandbits(len_string*8), len_string)
for x in range(num_strings)
)
cProfile.run('bits_strings(1000, 2000)')
To benchmark it, I used cProfile.run()
.
When I ran the code the first time, it seemed decently fast:
cProfile.run('bits_strings(1000, 2000)')
2005005 function calls in 0.368 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.970 1.970 <stdin>:1(bits_strings)
1000 0.001 0.000 1.963 0.002 <stdin>:1(chrs)
1001 0.001 0.000 1.970 0.002 <stdin>:2(<genexpr>)
1 0.000 0.000 1.970 1.970 <string>:1(<module>)
2000000 0.173 0.000 0.173 0.000 {chr}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 0.005 0.000 0.005 0.000 {method 'getrandbits' of '_random.Random' objects}
1000 0.178 0.000 1.953 0.002 {method 'join' of 'str' objects}
1001 0.009 0.000 0.009 0.000 {range}
but later in the day:
cProfile.run('bits_strings(1000, 2000)')
4005004 function calls in 1.960 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 1.961 1.961 <stdin>:1(bits_strings)
1000 0.001 0.000 1.954 0.002 <stdin>:1(chrs)
2001000 1.593 0.000 1.762 0.000 <stdin>:2(<genexpr>)
1 0.000 0.000 1.961 1.961 <string>:1(<module>)
2000000 0.170 0.000 0.170 0.000 {chr}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
1000 0.005 0.000 0.005 0.000 {method 'getrandbits' of '_random.Random' objects}
1000 0.182 0.000 1.944 0.002 {method 'join' of 'str' objects}
1001 0.009 0.000 0.009 0.000 {range}
The difference looks like it comes from the <genexpr>
, which- from the cumtime
and the location across various similar tests, I've inferred to be the one in chrs()
. (My instinct is that (chr((bits >> (8*x)) & 0xff) for x in range(len_string))
should be getting called N
times, while chr((bits >> (8*x)) & 0xff)
should be getting called N*k
times).
So what's the difference between the two bits_strings()
functions? When trying to find the difference in the code between the two attempts, I found none. In fact, I had a Python REPL session where I defined the functions, ran the profiler twice (the second time, to check whether this was a caching issue), and then re-defined the functions by using the up-arrow to retrieve my previously-entered code. It exhibited the same divergence: https://pastebin.com/1u1j1ZUt
Theories & Thoughts
I suspect cProfile.
Switched between the "fast" and "slow" versions by redefining chrs()
but had cProfile.run()
get profiled by cProfile.run()
. In both observed time and per the outer cProfile.run()
's reporting, it looks like the inner cProfile.run()
(the one that profiles bits_strings()
) has been occasionally underreporting how long bits_strings()
takes: https://pastebin.com/C4W4FEjJ
I also used time.time()
before/after cProfile.run('bits_strings(1000,1000))
calls and noticed a discrepancy between the "fast" profile cumtime
and the actual wall-clock time taken when running the method.
But I'm still unclear on the mechanism.
Caching: Same code, drastically different performance between runs could be a caching thing. However, the fast version was the first run I ever did with this code (although I'd run other code before in the same REPL session). And I can switch from slow to fast or fast to slow by simply redefining
chrs()
andbits_strings()
, although whether or not I actually switch seems random. Also worth noting is that I've only been able to switch (afaict) by redefining the functions, not by simply rerunning them. So I'm not sure what would be cached or how it would change the # of<genexpr>
calls by a factor ofk
upon redefining the function.cProfile quirk: I ran the same code without
cProfile.run()
and timed it by getting the currenttime.time()
insidebits_strings
twice; the total time taken was comparable to that withcProfile.run()
for the slow case. SocProfile.run()
is at least not slowing things down. There is a possibility that the fast result is tainted bycProfile.run()
,, but anecdotally for large values of.N
andk
((10000, 10000)
) I noticed a perceptible difference between the first run ofbits_strings()
and the slow one later in the dayinterpreter quirk: I thought this could be happening because of the interpreter wrongly optimizing the generator expressions inside
chrs()
andbits_strings()
, but this is the you-pay-for-what-you-write CPython interpreter so that seems unlikely.Python 2.7 quirk: Redefining
chrs()
(using the up-arrow to retrieve the prior definition) inconsistently does the same trick inpython3
: https://pastebin.com/Uw7PgF7iZalgo coming to exact revenge for that one time I parsed HTML with regex: not ruled out
I'm tempted to believe that bits_strings()
, at different times (different definitions of the same function with the same code, not different runs), has actually performed differently. But I have no clue as to why that could be the case. I suspect this is related more closely to chrs()
than it is to bits_strings()
.
I'm unclear on why redefining chrs()
seems to be unpredictably affecting the profiler output.