As I understand, the current behavior expects that the same module is passed to the main
parameter in dump_session()
and load_session()
, which makes sense if you think that all the classes and functions defined in the saved module have their __module__
and __qualname__
attributes pointing to it.
That said, there are two issues:
- The
main
parameter of load_session()
is redundant, or at least should have None
as default;
- The current implementation of session loading is actually doing some duplicated work. It does the equivalent of the following sequence of operations:
# In Unpickler.load()
dump_main = importlib.import_module("<dump_main>")
vars(dump_main).update(loaded_module_dict) # once
# In load_session(..., main=load_main)
vars(load_main).update(vars(dump_main)) # twice
_restore_module(load_main) # restore imported objects saved by reference (with byref=True)
This explains the strange outcome when saving __main__
and restoring it in module test
, and it's easy to fix.
What I'm not so sure is what should be the expected behavior of saving a module and restoring its contents in a different one —or if it should be allowed at all.
Note: if a saved module was created at runtime, it can be recreated manually before calling load_session()
:
sys.modules['name'] = type.ModuleType('name')
Maybe dill
can also do it automatically somehow, but it is a potential situation for bugs if the user is expecting the module to actually exist and be imported. Would a warning suffice?
Update
Here is a good example of why this usage should not to be allowed in load_session()
:
>>> class Point:
... def __init__(self, x, y):
... self.x = x
... self.y = y
>>> p = Point(1, 2)
>>> import dill
>>> dill.dump_session()
In a different session:
>>> import dill, sys, types
>>> sys.modules['test'] = test = types.ModuleType('test')
>>> dill.load_session(main=test)
>>> q = dill.copy(test.p)
File "/usr/lib/python3.8/pickle.py", line 1070, in save_global
raise PicklingError(
_pickle.PicklingError: Can't pickle <class '__main__.Point'>: it's not found as __main__.Point
Beyond this simple example, not respecting namespaces is a potential source for many subtle, silent bugs...
I don't know what the OP was trying to accomplish, but for simple cases he could do something like dill.dump(vars(mod1), file)
and then vars(mod2).update(dill.load(file))
in the other session. It would save classes by reference, but functions can be saved fully with the recurse
option. I would at least save {k: v for k, v in vars(mod1) if not k.startswith('__')}
instead of the whole vars(mod1)
thought, to not mess with mod2
attributes.