0

I'm trying to pickle (using the dill extension) a workflow object from pyutilib.workflow like so using python 2.7. The end objective here is to be able to insert these workflow objects into a MongoDB database and pulled out on the other end when needed:

from pyutilib import workflow
import testworkflow
from bson.binary import Binary
import pickle
import dill
import weakref

A = testworkflow.testTask()
w = workflow.Workflow()
w.add(A)

with open('w.dill', 'wb') as f:
    scriptbytes = dill.dump(w, f)
script.close()

testworkflow.py only contains testTask(), which is written as follows:

import pyutilib.workflow

class testTask(pyutilib.workflow.Task):

    def __init__(self, *args, **kwds):
        pyutilib.workflow.Task.__init__(self, *args, **kwds)
        self.inputs.declare('x')
        self.inputs.declare('y')
        self.outputs.declare('z')

    def execute(self):
        self.z = self.x + self.y

But when I attempt to execute it to serialize the workflow object, I get a massive traceback list from the pickle.py file, at the very bottom of which is simply "AssertionError".

It seems to have troubles with things like:

File "/usr/lib/python2.7/pickle.py", line 649, in save_dict
  self._batch_setitems(obj.iteritems())
File "/usr/lib/python2.7/pickle.py", line 681, in _batch_setitems
  save(v)
File "/usr/lib/python2.7/pickle.py", line 286, in save
  f(self, obj) # Call unbound method with explicit self
File "/usr/local/lib/python2.7/dist-packages/dill/dill.py", line 905, in save_weakref
  pickler.save_reduce(_create_weakref, (refobj,), obj=obj)
File "/usr/lib/python2.7/pickle.py", line 405, in save_reduce
  self.memoize(obj)
File "/usr/lib/python2.7/pickle.py", line 244, in memoize
  assert id(obj) not in self.memo

The above chunk of traceback is, seriously, about 1% of the total list. Several tracebacks are to the same line of code, so is it a circular reference problem? I'm absolutely new to this type of project and I've searched all over for other related questions but none really seem to be quite relevant enough.

Am I missing some newer libraries? Is there a better way to do this?

EDIT: as per Martijn Pieters' helpful comment

Pickling is a recursive process, which is why you see certain lines repeated. The process ends up back at an object that was pickled before (id(obj) in self.memo is true only if the object was already processed).

So how can I stop this condition from being triggered? Why can't pickling automatically ignore already-serialized chunks as a base case in recursion?

EDIT 2: 'dill.detect.trace(true)' traceback:

T4: <class 'pyutilib.workflow.workflow.Workflow'>
D2: <dict object at 0x7ffff23955c8>
T4: <class 'pyutilib.workflow.port.InputPorts'>
T4: <class 'pyutilib.workflow.port.Port'>
D2: <dict object at 0x7ffff20a9b40>
R1: <weakref at 0x7ffff238aaf8; to 'Workflow' at 0x7ffff2399d90>
F2: <function _create_weakref at 0x7ffff238c410>
D2: <dict object at 0x7ffff20a9e88> 
D2: <dict object at 0x7ffff2395c58>
T4: <class 'argparse.ArgumentParser'>
D2: <dict object at 0x7ffff20a3050>
F2: <function _compile at 0x7ffff7ed81b8>
D2: <dict object at 0x7ffff20a3398>
T4: <class 'argparse._HelpAction'>
D2: <dict object at 0x7ffff20a36e0>
T4: <class 'argparse._ArgumentGroup'>
D2: <dict object at 0x7ffff20a3c58>
D2: <dict object at 0x7ffff20a34b0>
D2: <dict object at 0x7ffff20a3168>
D2: <dict object at 0x7ffff20a3280>
T4: <class 'argparse._StoreFalseAction'>
T4: <class 'argparse._AppendConstAction'>
T4: <class 'argparse._StoreTrueAction'>
T4: <class 'argparse._CountAction'>
T4: <class 'argparse._StoreConstAction'>
T4: <class 'argparse._VersionAction'>
T4: <class 'argparse._StoreAction'>
T4: <class 'argparse._SubParsersAction'>
T4: <class 'argparse._AppendAction'>
D2: <dict object at 0x7ffff20a35c8>
F1: <function identity at 0x7ffff23a1d70>
F2: <function _create_function at 0x7ffff2389e60>
Co: <code object identity at 0x7ffff4c118b0, file      "/usr/lib/python2.7/argparse.py", line 1591>
F2: <function _unmarshal at 0x7ffff2389cf8>
D4: <dict object at 0x7ffff4c1a050>
D2: <dict object at 0x7ffff20abd70>
D2: <dict object at 0x7ffff20a3910>
T4: <class 'argparse.HelpFormatter'>
T4: <class 'pyutilib.workflow.port.OutputPorts'>
D2: <dict object at 0x7ffff20a3b40>
D2: <dict object at 0x7ffff20ab050>
D2: <dict object at 0x7ffff2395d70>
D2: <dict object at 0x7ffff20a5050>
D2: <dict object at 0x7ffff2395e88>
T4: <class 'pyutilib.workflow.task.EmptyTask'>
D2: <dict object at 0x7ffff20a7398>
D2: <dict object at 0x7ffff20ab280>
R1: <weakref at 0x7ffff238aba8; to 'EmptyTask' at 0x7ffff20a6290>
T4: <class 'pyutilib.workflow.connector.DirectConnector'>
D2: <dict object at 0x7ffff20ab4b0>
D2: <dict object at 0x7ffff2395b40>
R1: <weakref at 0x7ffff238aaa0; to 'testTask' at 0x7ffff23999d0>
T4: <class 'testworkflow.testTask'>
D2: <dict object at 0x7ffff2398d70>
D2: <dict object at 0x7ffff2395a28>
R1: <weakref at 0x7ffff238aaa0; to 'testTask' at 0x7ffff23999d0>
D2: <dict object at 0x7ffff20a9d70>
D2: <dict object at 0x7ffff20a97f8>
R1: <weakref at 0x7ffff238ab50; to 'EmptyTask' at 0x7ffff2399fd0>
D2: <dict object at 0x7ffff20a5168>
D2: <dict object at 0x7ffff20a5280>
R1: <weakref at 0x7ffff238ab50; to 'EmptyTask' at 0x7ffff2399fd0>
D2: <dict object at 0x7ffff20a55c8>
D2: <dict object at 0x7ffff20a5910>
D2: <dict object at 0x7ffff20a5c58>
D2: <dict object at 0x7ffff20a7280>
D2: <dict object at 0x7ffff20a5a28>
D2: <dict object at 0x7ffff20a56e0>
D2: <dict object at 0x7ffff20a57f8>
D2: <dict object at 0x7ffff20a5b40>
F1: <function identity at 0x7ffff23a1de8>
D4: <dict object at 0x7ffff4c1a050>
D2: <dict object at 0x7ffff20b4280>
D2: <dict object at 0x7ffff20a5e88>
D2: <dict object at 0x7ffff20a7168>
D2: <dict object at 0x7ffff20a9c58>
D2: <dict object at 0x7ffff20ab168>
D2: <dict object at 0x7ffff2395910>
D2: <dict object at 0x7ffff20a5398>
D2: <dict object at 0x7ffff20a75c8>
D2: <dict object at 0x7ffff20a54b0>
D2: <dict object at 0x7ffff20a74b0>
Connor G.
  • 23
  • 1
  • 4
  • Pickling is a recursive process, which is why you see certain lines repeated. The process ends up back at an object that was pickled before (`id(obj) in self.memo` is true only if the object was already processed). – Martijn Pieters Mar 29 '15 at 15:56
  • Okay, so how can I eliminate this problem? Do I have to monkeypatch the dill code or something? – Connor G. Mar 29 '15 at 15:57
  • Sorry, I don't know; I am not familiar enough with what `dill` does, or what is stored in `pyutilib.workflow` objects. – Martijn Pieters Mar 29 '15 at 15:59
  • You can try running your code again with `dill.detect(True)`, and then look at what the trace is as things are pickled. You can also dig around in the existing object with `dill.detect.badobjects` and other tools in `dill.detect`. – Mike McKerns Mar 30 '15 at 02:41
  • also, which `dill` version are you using? The recent release or the github trunk? It seems like it's something that is an object that `dill` does not know how to serialize. There are different ways to help `dill` learn how to handle the object… so post the relevant part of the traceback with `dill.detect(True)`. – Mike McKerns Mar 30 '15 at 11:50
  • Dill 0.2.2 I believe. adding 'dill.detect(True)', compiling and running didn't yield anything new other than the traceback. Is there a better way to do this? – Connor G. Mar 30 '15 at 14:40
  • Is there a way to hack dill and utilize iteration instead of recursion? May sound dumb but I'm not intimately familiar with dill's internal workings – Connor G. Mar 30 '15 at 14:48
  • And the traceback with `dill.detect(True)` is…? It should have new information. You could edit your answer to show the new traceback. – Mike McKerns Mar 31 '15 at 12:16
  • I added that line of code and didn't get any new traceback, just the same thing, what am I doing wrong? EDIT: Or, rather, where should that line be in my code? – Connor G. Mar 31 '15 at 14:02
  • Add the line after the import statements. The end of the traceback should be the same, but there should be new lines in scattered in the traceback telling you what objects are being serialized as `dill` and `pickle` go to work. – Mike McKerns Mar 31 '15 at 14:16
  • Get 'TypeError: 'module' object is not callable' when doing so – Connor G. Mar 31 '15 at 14:46
  • …oops, my bad. Try: `dill.detect.trace(True)`. – Mike McKerns Mar 31 '15 at 21:35
  • Tried that, got 'AttributeError: 'module' object has no attribute 'detect' – Connor G. Apr 01 '15 at 15:51
  • Got it, I think, see my edits above – Connor G. Apr 01 '15 at 16:00
  • It looks like it dies trying to pickle something that uses a `weakref`. `dill` can pickle a `weakref`, but if the object it refers to is not serializable, it'll fail. I see that also your `TestTask` is not serializable. I looked at the code for `pyutilib` and it looks like a `workflow.Task` does some "intelligent" stuff under the covers. – Mike McKerns Apr 04 '15 at 19:44

0 Answers0