1

Trying to render a large and (possibly very) unpicklable object to a file for later use.

No complaints on the dill.dump(file) side:

In [1]: import echonest.remix.audio as audio

In [2]: import dill

In [3]: audiofile = audio.LocalAudioFile("/Users/path/Track01.mp3")
en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmpWbonbH.wav"
Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
Probing for existing analysis

In [4]: with open('audio_object_dill.pkl', 'wb') as f:
   ...:     dill.dump(audiofile, f)
   ...:  

In [5]: 

But trying to load the .pkl file:

In [1]: import dill

In [2]: with open('audio_object_dill.pkl', 'rb') as f:
   ...:     audio_object = dill.load(f)
   ...:  

Returns following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-203b696a7d73> in <module>()
      1 with open('audio_object_dill.pkl', 'rb') as f:
----> 2     audio_object = dill.load(f)
      3 

/Users/mikekilmer/Envs/GLITCH/lib/python2.7/site-packages/dill-0.2.2.dev-py2.7.egg/dill/dill.pyc in load(file)
    185     pik = Unpickler(file)
    186     pik._main_module = _main_module
--> 187     obj = pik.load()
    188     if type(obj).__module__ == _main_module.__name__: # point obj class to main
    189         try: obj.__class__ == getattr(pik._main_module, type(obj).__name__)

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load(self)
    856             while 1:
    857                 key = read(1)
--> 858                 dispatch[key](self)
    859         except _Stop, stopinst:
    860             return stopinst.value

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.pyc in load_newobj(self)
   1081         args = self.stack.pop()
   1082         cls = self.stack[-1]
-> 1083         obj = cls.__new__(cls, *args)
   1084         self.stack[-1] = obj
   1085     dispatch[NEWOBJ] = load_newobj

TypeError: __new__() takes at least 2 arguments (1 given)

The AudioObject is much more complex (and large) than the class object the above calls are made on (from SO answer), and I'm unclear as to whether I need to send a second argument via dill, and if so, what that argument would be or how to tell if any approach to pickling is viable for this specific object.

Examining the object itself a bit:

In [4]: for k, v in vars(audiofile).items():
...:     print k, v
...: 

returns:

is_local False
defer False
numChannels 2
verbose True
endindex 13627008
analysis <echonest.remix.audio.AudioAnalysis object at 0x103c61bd0>
filename /Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3
convertedfile /var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp9ADD_Z.wav
sampleRate 44100
data [[0 0]
 [0 0]
 [0 0]
 ..., 
 [0 0]
 [0 0]
 [0 0]]

And audiofile.analysis seems to contain an attribute called audiofile.analysis.source which contains (or apparently points back to) audiofile.analysis.source.analysis

Community
  • 1
  • 1
MikeiLL
  • 6,282
  • 5
  • 37
  • 68
  • exploring the docs a bit more in depth - contained at https://pypi.python.org/pypi/dill – MikeiLL Aug 28 '14 at 00:56
  • Am reading in https://docs.python.org/2/library/pickle.html that "file must have two methods". Maybe the file I'm saving only has one method and that's the missing second argument that's breaking `cls.__new__(cls, *args)` – MikeiLL Aug 28 '14 at 01:00
  • 1
    Is the `echonest` API something I could grab a hold of to try out if needed? Anyway, there are a few things that you can try to discover what's going on. First, since it's a class, you can try to toggle the `byref` in `dill.dumps`, to toggle pickling the class "by reference". If that doesn't work, try turning on `dill.detect.trace(True)` to see internal checkpoints in the (de)serialization. You can also look at methods in `dill.detect`, such as `badobjects` that can help diagnose what going on. It looks like a mismatch in `__getstate__` and `__setstate__`, which would be weird. – Mike McKerns Aug 28 '14 at 09:27
  • It is, @MikeMcKerns. http://echonest.com. There are two relevant modules available via http://developer.echonest.com/ and I shared the procedure at: http://www.mzoo.org/getting-the-python-echonest-remix-package-running. Are https://pypi.python.org/pypi/dill, http://trac.mystic.cacr.caltech.edu/project/pathos/wiki/dill and the pickle docs basically the extent of the reading material I should be looking at (in implementing the above recommendations)? – MikeiLL Aug 28 '14 at 15:58
  • If using `with open('audio_object_dill.pkl', 'wb') as f:` byref would be set like this, `dill.dump(audiofile, f, byref=True)` with `False` being the default, right? `dill.load` results are the same. Entered `dill.detect.trace(True)` prior to `dill.dump` call results: http://pastebin.com/V0fA7aVJ. Lastly, `dill.detect.badobjects(audiofile)` returns ``. Hmph. – MikeiLL Aug 28 '14 at 16:57
  • First of all, wow is this a cool package. Digging around a bit, `dill.detect.children(audiofile, echonest.remix.audio.LocalAudioFile)` yields name 'echonest' is not defined - actually simply had to call it with the variable module was imported with: `dill.detect.children(audiofile, audio.LocalAudioFile)`, which yields our old friend `[]` – MikeiLL Aug 28 '14 at 19:49
  • Wait! Apparently the API has a built-in method: http://echonest.github.io/remix/apidocs/echonest.remix.audio-pysrc.html#LocalAudioFile.save – MikeiLL Aug 28 '14 at 19:59
  • 1
    Geez that's a horrendous `trace` you have in the pastebin. Yes, that's all the reading material on `dill`, unfortunately. By the way, you should try `badobjects(audiofile, depth=1)` -- that allows you to dig into each object, even ones that fail. Also check out this as an example of what dill detection can do. http://stackoverflow.com/questions/10082241/how-to-get-a-python-functions-dependencies-for-pickling http://stackoverflow.com/questions/25241139/pickle-error-assert-idobj-not-in-self-memo – Mike McKerns Aug 29 '14 at 01:12
  • Definitely had played with badobjects(audiofile, depth=1), but it hangs giving `f(self, obj) # Call unbound method with explicit self` in pickle's `save` method. – MikeiLL Aug 29 '14 at 13:39
  • So did, the built-in "save" work from the API? It looked like that may be what they expect you use (instead of `dump`, directly), and that might be why it seems like `load` expects something different. – Mike McKerns Aug 29 '14 at 22:20
  • Yes. Built-in save works and re-loads beautifully using Dill. – MikeiLL Aug 30 '14 at 01:06
  • then you should answer your own question(s), as others might run into the same thing. – Mike McKerns Aug 30 '14 at 15:21
  • or I'll answer it. someone should, so people don't need to dig into the comments. – Mike McKerns Aug 31 '14 at 20:30
  • @MikeMcKerns I will answer it an look forward to having the opportunity to. Probably tomorrow and thank you for the reminder. Have been thinking about. I think i might even have the S.O. cred to add echonest as a keyword. – MikeiLL Sep 01 '14 at 18:46

1 Answers1

1

In this case, the answer lay within the module itself.

The LocalAudioFile class provides (and each of it's instances can therefor utilize) it's own save method, called via LocalAudioFile.save or more likely the_audio_object_instance.save.

In the case of an .mp3 file, the LocalAudioFile instance consists of a pointer to a temporary .wav file which is the decompressed version of the .mp3, along with a whole bunch of analysis data which is returned from the initial audiofile, after it's been interfaced with the (internet-based) Echonest API.

LocalAudioFile.save calls shutil.copyfile(path_to_wave, wav_path) to save the .wav file with same name and path as original file linked to audio object and returns an error if the file already exists. It calls pickle.dump(self, f) to save the analysis data to a file also in the directory the initial audio object file was called from.

The LocalAudioFile object can be reintroduced simply via pickle.load().

Here's an iPython session in which I used the dill, which is a very useful wrapper or interface that offers most of the standard pickle methods plus a bunch more:

audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")

In [1]: import echonest.remix.audio as audio

In [2]: import dill
# create the audio_file object
In [3]: audiofile = audio.LocalAudioFile("/Users/mikekilmer/Envs/GLITCH/glitcher/audio/Track01.mp3")
en-ffmpeg -i "/Users/path/audio/Track01.mp3" -y -ac 2 -ar 44100 "/var/folders/X2/X2KGhecyG0aQhzRDohJqtU+++TI/-Tmp-/tmp_3Ei0_.wav"
Computed MD5 of file is b3820c166a014b7fb8abe15f42bbf26e
Probing for existing analysis
#call the LocalAudioFile save method
In [4]: audiofile.save()
Saving analysis to local file /Users/path/audio/Track01.mp3.analysis.en
#confirm the object is valid by calling it's duration method
In [5]: audiofile.duration
Out[5]: 308.96
#delete the object - there's probably a "correct" way to do this
in [6]: audiofile = 0
#confirm it's no longer an audio_object
In [7]: audiofile.duration
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-04baaeda53a4> in <module>()
----> 1 audiofile2.duration

AttributeError: 'int' object has no attribute 'duration'


#open the pickled version (using dill)
In [8]: with open('/Users/path/audio/Track01.mp3.analysis.en') as f:
   ....:     audiofile = dill.load(f)
   ....:     
#confirm it's a valid LocalAudioFile object
In [8]: audiofile.duration
Out[8]: 308.96

Echonest is a very robust API and the remix package provides a ton of functionality. There's a small list of relevant links assembled here.

MikeiLL
  • 6,282
  • 5
  • 37
  • 68