10

I'm trying to profile some very simple code (using both cProfile and pyinstrument). The code is:

sum(1 for e in range(1533939))

When running the code without the profiler active, it is very quick (~85ms). However when attempting to run the same code in a profiler, it suddenly takes almost 13 seconds.

I'm doing this (in a Jupyter notebook):

%%prun

sum(1 for e in range(1533939))

I figured the problem is the overhead caused by the numerous calls to "next" inside the generator expression, however, running the same experiment on my host machine (not inside the container) is not showing a slowdown when profiling.

Any idea why the profiler might be slowing this code down so much?

For the record, I'm using Jupyter's container "jupyter/scipy-notebook" as the base container.

Thanks!

Edan Maor
  • 9,772
  • 17
  • 62
  • 92
  • Any volumes attached to this container? And what is your host OS? – β.εηοιτ.βε Jun 08 '20 at 19:25
  • Host OS is MacOS. The code directory is connected to my host. – Edan Maor Jun 09 '20 at 06:52
  • I would guess you are hitting the infamous thread https://forums.docker.com/t/file-access-in-mounted-volumes-extremely-slow-cpu-bound/8076 or issue https://github.com/docker/for-mac/issues/77. The profile might need some extra IO compared to the normal execution of your script. And so you are hitting this in profiling. Have your tried the instruction on the [performance tuning for volume page](https://docs.docker.com/docker-for-mac/osxfs-caching/) yet? – β.εηοιτ.βε Jun 09 '20 at 09:48
  • 1
    I am not sure but this is probably because of consuming extra resources as part of the selected Environment Execution process , just take an example you wrote a lambda function with simple arithmetic problem, it will execute with in seconds but same if you write any Glue Job or Jupyter Notebook It will run some Pyspark Jobs, Hadoop Execution, executes Mappers for this, which is irrelevant for small set of dataset, that's why Hadoop/BigData is preferrable for large DataSets only, similarly is the case with your solution. – DHEERAJ Jun 09 '20 at 14:46
  • 3
    Are you running the same version of python on both the inside and outside? Are you saying that doing something like `python -c 'import cProfile; cProfile.run("sum(1 for e in range(1533939))")'` directly is dramatically different between the two? – Anon Jun 10 '20 at 19:19

0 Answers0