Does the dill python module handle importing modules when sys.path differs?

Question

I'm evaluating dill and I want to know if this scenario is handled. I have a case where I successfully import a module in a python process. Can I use dill to serialize and then load that module in a different process that has a different sys.path which doesn't include that module? Right now I get import failures but maybe I'm doing something wrong.

Here's an example. I run this script where the foo.py module's path is in my sys.path:

% cat dill_dump.py 
import dill
import foo
myFile = "./foo.pkl"
fh = open(myFile, 'wb')
dill.dump(foo, fh)

Now, I run this script where I do not have foo.py's directory in my PYTHONPATH:

% cat dill_load.py 
import dill
myFile = "./foo.pkl"
fh = open(myFile, 'rb')
foo = dill.load(fh)
print foo

It fails with this stack trace:

Traceback (most recent call last):
  File "dill_load.py", line 4, in <module>
    foo = dill.load(fh)
  File "/home/b/lib/python/dill-0.2.4-py2.6.egg/dill/dill.py", line 199, in load
    obj = pik.load()
  File "/rel/lang/python/2.6.4-8/lib/python2.6/pickle.py", line 858, in load
    dispatch[key](self)
  File "/rel/lang/python/2.6.4-8/lib/python2.6/pickle.py", line 1133, in load_reduce
    value = func(*args)
  File "/home/b/lib/python/dill-0.2.4-py2.6.egg/dill/dill.py", line 678, in _import_module
    return __import__(import_name)
ImportError: No module named foo

So, if I need to have the same python path between the two processes, then what's the point of serializing a python module? Or in other words, is there any advantage to loading foo via dill over just having an "import foo" call?

Mike McKerns · Accepted Answer · 2015-08-09T17:35:07.237

That's an interesting failure. Notice that if you do dill.dumps(foo) you will get the contents of the module foo… the part that fails is using python's built-in import hook (__import__) to do little more than to register the module into sys.modules. It should be possible to work around that and modify dill so that the module could be imported if the module is not found in the PYTHONPATH. However, I do think it's proper that the module have to be found in the PYTHONPATH… that is what is expected of a module… so I'm not sure if it's a good idea. But it might be...

As noted above, for a file foo.py, with contents: hello = "hello world, I am foo"

>>> import dill
>>> import foo
>>> dill.dumps(foo)
'\x80\x02cdill.dill\n_import_module\nq\x00U\x03fooq\x01\x85q\x02Rq\x03}q\x04(U\x08__name__q\x05h\x01U\x08__file__q\x06U\x06foo.pyq\x07U\x05helloq\x08U\x15hello world, I am fooq\tU\x07__doc__q\nNU\x0b__package__q\x0bNub.'

You can see the contents of the file are preserved in the pickle.

The primarily reason to use dill with modules, is that dill can record dynamic modifications to modules. For example, adding a function or other object:

>>> import foo 
>>> import dill
>>> foo.a = 100
>>> with open('foo.pkl', 'w') as f:
...   dill.dump(foo, f)
... 
>>>

Then restarting… (with foo in the PYTHONPATH)

Python 2.7.10 (default, May 25 2015, 13:16:30) 
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.40)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> with open('foo.pkl', 'r') as f:
...   foo = dill.load(f)
... 
>>> foo.hello
'hello world, I am foo'
>>> foo.a
100
>>>

I've added this as a bug report / feature request: https://github.com/uqfoundation/dill/issues/123

One suggestion on how to implement this is to capture sys.path in the serialized data stream so that you know where the _foo_ module came frame when loading. This would be useful in that it would allow greater portability of the python state where you could load up a python session on machines or shells with different settings. — Brent V, Aug 11 '15 at 19:40
@BrentV: Good idea. That information should actually be in the `__file__` attribute. So `foo.__file__` will tell you where it was loaded from exactly. This could only be used when on the same computer, of course, so it's not a fully portable solution. I was going along the lines of capturing the entire module code with `inspect` or `dill.source.getsource`. — Mike McKerns, Aug 12 '15 at 02:13

Does the dill python module handle importing modules when sys.path differs?

1 Answers1

Linked