We examined a bunch of profilers and found that for the benchmark we used for testing (from the Pyperformance suite), using kernprof
causes roughly a 7x slowdown when executing pure Python code. This is remarkably close to your observed execution time dilation (6.25x), so I believe this is almost certainly the cause.
If you want to use a profiler that has less overhead (nearly none) and is thus more accurate, there are a handful of options. I personally recommend Scalene, a profiler that simultaneously profiles CPU, GPU, and memory, all with very low overhead and with no need to modify your code (i.e., adding @profile
decorators is not required). Full disclosure: I am one of the primary authors of Scalene.