2

Overview

I need an Azure python function to serve up a pickled object, and I need the resultant pkl file to be reasonably portable, so that it can be used in any typical notebook or python code file with minimal imports.

Test code used to generate test pkl file bytes:

import dill

def test():
    print("It worked!")

result = dill.dumps(test)

The result is manually saved to disk as a pkl file.

Code used to load the pkl file in another project:

import dill
file = open('test.pkl', 'rb')
test = dill.load(file)
file.close()
test()

The problem

In all of my test cases, the above code works, but when running the code to generate the pkl file from the project which will be deployed to Azure the resulting pkl file cannot be loaded in another project, like a jupyter notebook.

When I attempt to load the pkl file, the terminal outputs:

ModuleNotFoundError: No module named '__app__'

Edit: after some digging it appears that the __app__ module is the default top-level module for Azure functions. What's further, dill is including absolute paths in the output when attempting to serialize functions and lambdas, which doesn't seem right. Does anyone know of a workaround? I'm not terribly familiar with the territory, and any pointers anyone can offer here would be very appreciated.

RTD
  • 334
  • 2
  • 12

1 Answers1

3

I'm the dill author. dill assumes the same environment is on the different resources -- meaning that the same versions of the same modules are installed. It can still work if the above is not true, but it's not guaranteed. This condition is not exclusive to dill, as the pickled objects would otherwise have to serialize all the dependency modules and the like.

You do however have some options. When a function is serialized, it likely also needs to store references in globals. There are options in how you handle the references to the objects in globals. For example, the default in dill is to store everything in globals -- so if your environment is doing something like injecting a module into globals when it starts... then functions pickled with dill, by default, will assume that same module is present when unpickling. Alternately, you can limit what the function stores from globals by using dill.seettings['recurse'] = True. This setting will attempt to recurse references into globals, and only store objects in globals that have pointer references to the target function.

Here you can see the difference:

Python 3.7.12 (default, Nov 11 2021, 17:34:58) 
[Clang 10.0.1 (clang-1001.0.46.4)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dill
>>> 
>>> def test():
...     print("It worked!")
... 
>>> result = dill.dumps(test)
>>> result
b'\x80\x03cdill._dill\n_create_function\nq\x00(cdill._dill\n_create_code\nq\x01(K\x00K\x00K\x00K\x02KCC\x0ct\x00d\x01\x83\x01\x01\x00d\x00S\x00q\x02NX\n\x00\x00\x00It worked!q\x03\x86q\x04X\x05\x00\x00\x00printq\x05\x85q\x06)X\x07\x00\x00\x00<stdin>q\x07X\x04\x00\x00\x00testq\x08K\x01C\x02\x00\x01q\t))tq\nRq\x0bc__builtin__\n__main__\nh\x08NN}q\x0cNtq\rRq\x0e.'
>>> 
>>> dill.settings['recurse'] = True
>>> dill.dumps(test)
b'\x80\x03cdill._dill\n_create_function\nq\x00(cdill._dill\n_create_code\nq\x01(K\x00K\x00K\x00K\x02KCC\x0ct\x00d\x01\x83\x01\x01\x00d\x00S\x00q\x02NX\n\x00\x00\x00It worked!q\x03\x86q\x04X\x05\x00\x00\x00printq\x05\x85q\x06)X\x07\x00\x00\x00<stdin>q\x07X\x04\x00\x00\x00testq\x08K\x01C\x02\x00\x01q\t))tq\nRq\x0b}q\x0cX\x05\x00\x00\x00printq\rcdill._dill\n_get_attr\nq\x0eX\x08\x00\x00\x00builtinsq\x0fX\x05\x00\x00\x00printq\x10\x86q\x11Rq\x12sh\x08NN}q\x13Ntq\x14Rq\x15.'

Also, dill interacts with the global dict differently when the target function is built in a file, as opposed to __main__. So the results you see will depend on if you built your test in a file or in the interpreter.

There are other serialization variants in dill.settings, with recurse and byref the most relevant for functions. Using dill.settings['byref'] = True will store a function by name reference only if that function is not defined in the interpreter.

You also might be able to manually temporarily pop the offending entries from the global dict, or it might be possible to use something like exec to create a new global dict that hopefully doesn't include the offending entry. I guess you can try either of those if changing the settings doesn't work for you.

Mike McKerns
  • 33,715
  • 8
  • 119
  • 139