1

I'm generating 2 session dumps with

import dill
a = 1234
dill.dump_session('test1.session')

and

import dill
b = 1234
dill.dump_session('test2.session')

Then loading them back into a module sequentially

import types
import dill

mod = types.ModuleType('test')
print(mod.__dict__.get('a', None))
print(mod.__dict__.get('b', None))

dill.load_session('test1.session', main=mod)
print(mod.__dict__.get('a', None))
print(mod.__dict__.get('b', None))


mod = types.ModuleType('test')
print(mod.__dict__.get('a', None))
print(mod.__dict__.get('b', None))

dill.load_session('test2.session', main=mod)
print(mod.__dict__.get('a', None))
print(mod.__dict__.get('b', None))

I get the following output

None   # a
None   # b
1234   # a
None   # b
None   # a
None   # b
1234   # a
1234   # b

Before the second session file is read, it clearly shows that the mod variable has no a nor b set. But then when reading in the test2.session file it suddenly sets both a and b. But only b was actually dumped into that file.

Is this a bug in dill or am I missing anything here?

wasp256
  • 5,943
  • 12
  • 72
  • 119

2 Answers2

2

As I understand, the current behavior expects that the same module is passed to the main parameter in dump_session() and load_session(), which makes sense if you think that all the classes and functions defined in the saved module have their __module__ and __qualname__ attributes pointing to it.

That said, there are two issues:

  1. The main parameter of load_session() is redundant, or at least should have None as default;
  2. The current implementation of session loading is actually doing some duplicated work. It does the equivalent of the following sequence of operations:
# In Unpickler.load()
dump_main = importlib.import_module("<dump_main>")
vars(dump_main).update(loaded_module_dict)  # once
# In load_session(..., main=load_main)
vars(load_main).update(vars(dump_main))  # twice
_restore_module(load_main)  # restore imported objects saved by reference (with byref=True)

This explains the strange outcome when saving __main__ and restoring it in module test, and it's easy to fix.

What I'm not so sure is what should be the expected behavior of saving a module and restoring its contents in a different one —or if it should be allowed at all.


Note: if a saved module was created at runtime, it can be recreated manually before calling load_session():

sys.modules['name'] = type.ModuleType('name')

Maybe dill can also do it automatically somehow, but it is a potential situation for bugs if the user is expecting the module to actually exist and be imported. Would a warning suffice?


Update

Here is a good example of why this usage should not to be allowed in load_session():

>>> class Point:
...   def __init__(self, x, y):
...     self.x = x
...     self.y = y
>>> p = Point(1, 2)
>>> import dill
>>> dill.dump_session()

In a different session:

>>> import dill, sys, types
>>> sys.modules['test'] = test = types.ModuleType('test')
>>> dill.load_session(main=test)
>>> q = dill.copy(test.p)
  File "/usr/lib/python3.8/pickle.py", line 1070, in save_global
    raise PicklingError(
_pickle.PicklingError: Can't pickle <class '__main__.Point'>: it's not found as __main__.Point

Beyond this simple example, not respecting namespaces is a potential source for many subtle, silent bugs...

I don't know what the OP was trying to accomplish, but for simple cases he could do something like dill.dump(vars(mod1), file) and then vars(mod2).update(dill.load(file)) in the other session. It would save classes by reference, but functions can be saved fully with the recurse option. I would at least save {k: v for k, v in vars(mod1) if not k.startswith('__')} instead of the whole vars(mod1) thought, to not mess with mod2 attributes.

leogama
  • 898
  • 9
  • 13
  • I agree... I believe the expected behavior was that `main` in `dump_session` and `load_session` should be the same. I don't think the docs state that they should be the same... and the interface as it is now is a bit inviting to do something like the OP tried to do here. So, as I asked... what should be the expected behavior here? Should `dill` allow the use case the OP is attempting? – Mike McKerns Jun 03 '22 at 17:21
1
>>> import types
>>> import dill
>>> 
>>> mod = types.ModuleType('test')
>>> mod
<module 'test'>
>>> print(mod.__dict__.get('a', None))
None
>>> print(mod.__dict__.get('b', None))
None
>>> 
>>> dill.load_session('test1.session', main=mod)
>>> print(mod.__dict__.get('a', None))
1234
>>> print(mod.__dict__.get('b', None))
None
>>> mod
<module '__main__' (<_frozen_importlib_external.SourceFileLoader object at 0x10aa2bc40>)>
>>> mod = types.ModuleType('test')
>>> mod
<module 'test'>
>>> print(mod.__dict__.get('a', None))
None
>>> print(mod.__dict__.get('b', None))
None
>>> 
>>> dill.load_session('test2.session', main=mod)
>>> print(mod.__dict__.get('a', None))
1234
>>> print(mod.__dict__.get('b', None))
1234
>>> mod
<module '__main__' (<_frozen_importlib_external.SourceFileLoader object at 0x10aa2bc40>)>
>>> 
>>> import __main__
>>> __main__.__dict__.get('a', None)
1234
>>> __main__.__dict__.get('b', None)
1234
>>> __main__
<module '__main__' (<_frozen_importlib_external.SourceFileLoader object at 0x10aa2bc40>)>

Note that dill is not loading the session into the module test that you created, it's loading into __main__. You can see that the module __main__ is located at 0x10aa2bc40, so it's the same module object being loaded into each time.

Is this a bug in dill? I'm not sure, and would have to look at what is expected of the main keyword argument. It might be interesting to open a GitHub issue on dill, and start a discussion on what the expected behavior of the main arg should be... or at least to request some clarity in the docs so it's clear what the behavior should be. [NOTE: I'm the dill author]

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139