12

I have a Python script that runs well when I run it normally:

$ python script.py <options>

I am attempting to profile the code using the cProfile module:

$ python -m cProfile -o script.prof script.py <options>

When I launch the above command I get an error regarding being unable to pickle a function:

Traceback (most recent call last):
  File "scripts/process_grid.py", line 1500, in <module>
    _compute_write_index(kwrgs)
  File "scripts/process_grid.py", line 626, in _compute_write_index
    args,
  File "scripts/process_grid.py", line 1034, in _parallel_process
    pool.map(_apply_along_axis_palmers, chunk_params)
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 266, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
    put(task)
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/home/james/miniconda3/envs/climate/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <function _apply_along_axis_palmers at 0x7fe05a540b70>: attribute lookup _apply_along_axis_palmers on __main__ failed

The code uses multiprocessing, and I assume that this is where the pickling is taking place.

The code in play is here on GitHub.

Essentially I'm mapping a function and a corresponding argument dictionary in a process pool:

pool.map(_apply_along_axis_palmers, chunk_params)

The function _apply_along_axis_palmers is "picklable" as far as I know, in that it's defined at the top level of the module. Again this error doesn't occur when running outside of the cProfile context, so maybe that's adding additional constraints for pickling?

Can anyone comment as to why this may be happening, and/or how I can rectify the issue?

halfer
  • 19,824
  • 17
  • 99
  • 186
James Adams
  • 8,448
  • 21
  • 89
  • 148
  • FWIW, The only trouble I've ever had with pickling is pickling functions. – JacobIRR Dec 21 '18 at 20:57
  • 1
    Have a try with [this](https://stackoverflow.com/a/23423657/9059420). – Darkonaut Dec 21 '18 at 21:12
  • Thanks, @Darkonaut, I'm not sure how I didn't find that myself before writing this question. My issue is regarding a function rather than a class, which is what that question addresses, but maybe an idea is to create the function in a separate module file, as one of the answers suggests as a solution? I will give that a whirl... – James Adams Dec 21 '18 at 21:21
  • 1
    I have not tested this but maybe you can get away with just creating an intermediate `profile.py` which just imports and calls your `main` from your real target script. – Darkonaut Dec 21 '18 at 21:50

1 Answers1

15

The problem you've got here is that, by using -mcProfile, the module __main__ is cProfile (the actual entry point of the code), not your script. cProfile tries to fix this by ensuring that when your script runs, it sees __name__ as "__main__", so it knows it's being run as a script, not imported as a module, but sys.modules['__main__'] remains the cProfile module.

Problem is, pickle handles pickling functions by just pickling their qualified name (plus some boilerplate to say it's a function in the first place). And to make sure it will survive the round trip, it always double checks that the qualified name can be looked up in sys.modules. So when you do pickle.dumps(_apply_along_axis_palmers) (explicitly, or implicitly in this case by passing it as the mapper function), where _apply_along_axis_palmers is defined in your main script, it double checks that sys.modules['__main__']._apply_along_axis_palmers exists. But it doesn't, because cProfile._apply_along_axis_palmers doesn't exist.

I don't know of a good solution for this. The best I can come up with is to manually fix up sys.modules to make it expose your module and its contents correctly. I haven't tested this completely, so it's possible there will be some quirks, but a solution I've found is to change a module named mymodule.py of the form:

# imports...
# function/class/global defs...

if __name__ == '__main__':
    main()  # Or series of statements

to:

# imports...
import sys
# function/class/global defs...

if __name__ == '__main__':
    import cProfile
    # if check avoids hackery when not profiling
    # Optional; hackery *seems* to work fine even when not profiling, it's just wasteful
    if sys.modules['__main__'].__file__ == cProfile.__file__:
        import mymodule  # Imports you again (does *not* use cache or execute as __main__)
        globals().update(vars(mymodule))  # Replaces current contents with newly imported stuff
        sys.modules['__main__'] = mymodule  # Ensures pickle lookups on __main__ find matching version
    main()  # Or series of statements

From there on out, sys.modules['__main__'] refers to your own module, not cProfile, so things seem to work. cProfile still seems to work despite this, and pickling finds your functions as expected. Only real cost is reimporting your module, but if you're doing enough real work, the cost of reimporting should be fairly small.

ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
  • 1
    Brilliant answer, @ShadowRanger, you've taken me to school. Is this more or less what's described [here](https://stackoverflow.com/a/11513567/85248)? – James Adams Dec 21 '18 at 21:30
  • 1
    @JamesAdams: Yeah, the approach taken there is similar; it basically ignores the `__main__` module entirely in favor of reimporting it as non-`__main__` and using only the versions that are properly qualified to refer to the non-`__main__` version of the module. – ShadowRanger Dec 22 '18 at 01:46
  • I have the same problem, but my script is very simple. I have my class and the function that calls it in the same py file. Therefore I'm not importing a 'mymodule'. Is a variant of this solution still possible? – Liam McIntyre Mar 05 '20 at 01:49
  • @LiamMcIntyre: Just import yourself anyway; whatever your script is named, import it as shown and update it. – ShadowRanger Mar 05 '20 at 01:55
  • This solved my problem! No side effects noticed, but it does seem filthy. – Nick Crews Feb 08 '23 at 02:00