2

I am experiencing a module dependency failure when unpickling an object created on Linux and (attempting) to unpickle on Windows. The pickle correctly unpickles in Linux, but fails in Windows. Both systems are running Python 2.6.

I have studied the pickle manual page (with particular focus on having the unpickle environment the same as the pickle environment), and several pieces of great advice here - yet I remain stumped. Most are advising to make sure the sys.modules are correct and loaded. Here are some snippets that show what I'm attempting:

The pickling code:

...
pickle_fp = self.getPickleFile('wb')
pickler = Pickler(pickle_fp, protocol=2)
pickler.dump(archive)
pickle_fp.close()
...

In the unpickling code I have added a line to print out the sys.modules dictionary so we can see the modules present:

...
pickle_fp = self.getPickleFile('rb')
unpickler = Unpickler(pickle_fp)
pprint.pprint(sys.modules)
package = unpickler.load()
pickle_fp.close()
...

when I run the unpickle in Linux it works great. When I attempt to unpack the pickle generated on Linux on Windows I get:

...
ImportError: No module named photo_data

as for the environment, the pprint.pprint(sys.modules) produces, on Linux

...
'photo_data': <module 'photo_data' from 
'/home/xxx/Desktop/PythonPhoto/photo_data.pyc'>,
...

and on Windows

...
'photo_data': <module 'photo_data' from                       
'C:\Users\xxx\git\PhotoManagement\Photo\src\photo_data.pyc'>,
...

So it appears to me that I've got the photo_data module in the environment. I tried using pickle without the protocol (defaulting to 0) and I tried running unix2dos to strip characters. I'm officially stumped.

Thanks for your help!


Based on suggestions in the comments I have generated the simplest case that does not work. The classes I'm pickling look like this:

class photo_data:
    def __init__(self):
        self.isdir = False
        self.size = 0
        self.mtime = -(sys.maxint - 1) #Set default time to very old
        self.timestamp = datetime.datetime.strptime('1700:1:1 00:00:00', '%Y:%m:%d %H:%M:%S')
        self.gotTags = False
        self.signature = ''
        self.fileMD5 = ''
        self.userTags = ''
        self.inArchive = False
        self.candidates = []
        self.dirpaths = []
        self.filepaths = []      

class photo_collection:  #This class should be data only

    def __init__(self):
        self.host = ''
        self.path = ''
        self.photo = dict()
        self.pickle = None
        self.datasetChanged = False

    def __getitem__(self, key):
        return self.photo[key]

    def __setitem__(self, key, value):
        self.photo[key] = value

In my use case, the photo_collection object is instantiated, and the dictionary self.photo is filled with instances of photo_data. The simplest case that works between systems is a directory with one photo in it, and arbitrarily complex cases work within a system. The simplest case that doesn't work between systems is a directory with one photo and one subdirectory which also contains a photo.

Per request, I have attached two pickle files saved in format = 0. If you compare them I see that the program has descended the file tree in a different order (maybe not a big surprise since I copied the directory between systems and OSes), but otherwise they seem to open and close with identical structure other than file-specific data. I don't see how to upload files to a separate place so I include them in-line here.

This is the Windows-generated pickle:

(iphoto_data
photo_collection
p0
(dp1
S'path'
p2
S'C:\\Users\\scott_jackson\\Desktop\\phototest'
p3
sS'host'
p4
S'4DAA1001312'
p5
sS'pickle'
p6
NsS'datasetChanged'
p7
I01
sS'photo'
p8
(dp9
S'C:\\Users\\scott_jackson\\Desktop\\phototest\\img_4697.jpg'
p10
(iphoto_data
photo_data
p11
(dp12
S'isdir'
p13
I00
sS'dirpaths'
p14
(lp15
sS'filepaths'
p16
(lp17
sS'timestamp'
p18
cdatetime
datetime
p19
(S'\x07\xda\x04\x12\x124&\x00\x00\x00'
p20
tp21
Rp22
sS'gotTags'
p23
I01
sS'signature'
p24
S'9b2ca527b2bf0865d9b87ecd2a68d417'
p25
sS'fileMD5'
p26
S''
p27
sS'candidates'
p28
(lp29
sS'mtime'
p30
F1347576558.0
sS'inArchive'
p31
I00
sS'userTags'
p32
S'NA'
p33
sS'size'
p34
L6489323L
sbsg3
(iphoto_data
photo_data
p35
(dp36
g13
I01
sg14
(lp37
S'C:\\Users\\scott_jackson\\Desktop\\phototest\\060101 Nags Head'
p38
asg16
(lp39
g10
asg18
g19
(S'\x06\xa4\x01\x01\x00\x00\x00\x00\x00\x00'
p40
tp41
Rp42
sg23
I00
sg24
g27
sg26
g27
sg28
(lp43
sg30
I-2147483646
sg31
I00
sg32
g27
sg34
I0
sbsS'C:\\Users\\scott_jackson\\Desktop\\phototest\\060101 Nags Head\\img_1150.jpg'
p44
(iphoto_data
photo_data
p45
(dp46
g13
I00
sg14
(lp47
sg16
(lp48
sg18
g19
(S'\x07\xd6\x01\x01\x11\t#\x00\x00\x00'
p49
tp50
Rp51
sg23
I01
sg24
S'5925063685af0d741a23fe6d75523741'
p52
sg26
g27
sg28
(lp53
sg30
F1347751812.0
sg31
I00
sg32
g33
sg34
L538233L
sbsS'C:\\Users\\scott_jackson\\Desktop\\phototest\\060101 Nags Head'
p54
(iphoto_data
photo_data
p55
(dp56
g13
I01
sg14
(lp57
sg16
(lp58
g44
asg18
g19
(S'\x06\xa4\x01\x01\x00\x00\x00\x00\x00\x00'
p59
tp60
Rp61
sg23
I00
sg24
g27
sg26
g27
sg28
(lp62
sg30
I-2147483646
sg31
I00
sg32
g27
sg34
I0
sbssb.

And this is the Linux-generated pickle:

(iphoto_data
photo_collection
p0
(dp1
S'path'
p2
S'/home/scott/phototest'
p3
sS'host'
p4
S'barney'
p5
sS'pickle'
p6
NsS'datasetChanged'
p7
I01
sS'photo'
p8
(dp9
S'/home/scott/phototest/060101 Nags Head/img_1150.jpg'
p10
(iphoto_data
photo_data
p11
(dp12
S'isdir'
p13
I00
sS'dirpaths'
p14
(lp15
sS'filepaths'
p16
(lp17
sS'timestamp'
p18
cdatetime
datetime
p19
(S'\x07\xd6\x01\x01\x11\t#\x00\x00\x00'
p20
tp21
Rp22
sS'gotTags'
p23
I01
sS'signature'
p24
S'5925063685af0d741a23fe6d75523741'
p25
sS'fileMD5'
p26
S''
p27
sS'candidates'
p28
(lp29
sS'mtime'
p30
F1347751812.0842018
sS'inArchive'
p31
I00
sS'userTags'
p32
S'NA'
p33
sS'size'
p34
I538233
sbsS'/home/scott/phototest/060101 Nags Head'
p35
(iphoto_data
photo_data
p36
(dp37
g13
I01
sg14
(lp38
sg16
(lp39
g10
asg18
g19
(S'\x06\xa4\x01\x01\x00\x00\x00\x00\x00\x00'
p40
tp41
Rp42
sg23
I00
sg24
g27
sg26
g27
sg28
(lp43
sg30
I-9223372036854775806
sg31
I00
sg32
g27
sg34
I0
sbsS'/home/scott/phototest/img_4697.jpg'
p44
(iphoto_data
photo_data
p45
(dp46
g13
I00
sg14
(lp47
sg16
(lp48
sg18
g19
(S'\x07\xda\x04\x12\x124&\x00\x00\x00'
p49
tp50
Rp51
sg23
I01
sg24
S'9b2ca527b2bf0865d9b87ecd2a68d417'
p52
sg26
g27
sg28
(lp53
sg30
F1347576558.5344362
sg31
I00
sg32
g33
sg34
I6489323
sbsg3
(iphoto_data
photo_data
p54
(dp55
g13
I01
sg14
(lp56
S'/home/scott/phototest/060101 Nags Head'
p57
asg16
(lp58
g44
asg18
g19
(S'\x06\xa4\x01\x01\x00\x00\x00\x00\x00\x00'
p59
tp60
Rp61
sg23
I00
sg24
g27
sg26
g27
sg28
(lp62
sg30
I-9223372036854775806
sg31
I00
sg32
g27
sg34
I0
sbssb.

Please feel free to paste these into your favorite 'compare' editor; I don't know enough about pickles to detect the problem.

Thanks in advance for your help!!

sophros
  • 14,672
  • 11
  • 46
  • 75
labroid
  • 452
  • 4
  • 14
  • run python and print sys.path please? – unddoch Sep 13 '12 at 18:58
  • I'm running inside Eclipse. There is an extensive PYTHONPATH defined; I haven't figured out as yet how to print it out....I can only highlight one line at a time. Hmmmm.. – labroid Sep 13 '12 at 19:18
  • http://pydev.org/manual_adv_interactive_console.html for running python, http://pydev.org/manual_101_interpreter.html for configuring PYTHONPATH. you should add that to the question – unddoch Sep 13 '12 at 19:28
  • Got it. From within Eclipse, my PYTHONPATH is: C:\Users\xxx\Documents\Personal\Programming\eclipse\plugins\org.python.pydev_2.4.0.2012020116\PySrc\pydev_sitecustomize;C:\Users\xxx\git\PhotoManagement\Photo\src;C:\Python26;C:\Python26\DLLs;C:\Python26\lib;C:\Python26\lib\lib-tk;C:\Python26\lib\plat-win;C:\Python26\lib\site-packages;C:\Python26\lib\site-packages\PIL;C:\Python26\lib\site-packages\Pythonwin;C:\Python26\lib\site-packages\nose-0.11.3-py2.6.egg;C:\Python26\lib\site-packages\win32;C:\Python26\lib\site-packages\win32\lib;C:\Python26\lib\site-packages\wx-2.8-msw-unicode – labroid Sep 13 '12 at 19:45
  • The linux PYTHONPATH is probably more important here, because you're trying to figure out how the photo_data module got pulled in on linux, not why you can't find it on Windows. (The answer to the latter is obvious—it's not part of the standard library, and you didn't install it, so it's not there to find.) – abarnert Sep 13 '12 at 20:59
  • 1
    well, he has it installed (read the question), and it's on the path (C:\Users\xxx\git\PhotoManagement\Photo\sr‌​c) so it's really weird :/ – unddoch Sep 13 '12 at 21:10
  • May be you could set sys.set_trace() to find out more about the innards of error. Abstractly, both .pyc files and pickle files are not made to be safe and compatible across python versions and different platforms, so I suspect that it might be compatibility problem, resulting in a general ImportError? Just a possible direction... – Boris Burkov Sep 13 '12 at 21:30
  • could you post the simplest versions of your archive that is pickled with protocol 0 and throws this error? (pickled on both windows and linux) – User Sep 13 '12 at 22:20
  • Great suggestion. I make sure both were Python 2.6, I deleted all my .pyc to force a 'recompile', I changed to Protocol 0, and I changed from `cPickle` to `pickle`. Same results. I am trying to figure out how to use `sys.set_trace()`, and will look into creating simple pickles. This will take some hours. Thanks for all the help. – labroid Sep 13 '12 at 22:34
  • Well, the degenerate case works between systems. (My pickle is an instance of a class, that contains instances of another class. The degenerate case is one instance with one child instance.) So now I need to 'grow' that case to my 'real' case (which is ~250,000 children) and try again. Any reason a pickle might not work with an object that contains 250,000 children? Thanks! – labroid Sep 13 '12 at 23:41
  • I have to suspect that there's something about the directory structure. Are there any files or directories named `photo_data` nearby (in current directory, on `sys.path`, etc)? – nneonneo Sep 16 '12 at 00:28
  • And your Linux pickle that you posted does not load on your Windows machine? – nneonneo Sep 16 '12 at 00:29

2 Answers2

2

I've reproduced your problem by saving the pickle with CRLF (Windows) line endings on my Mac OS X machine.

The pickle machinery is quite particular about newlines. If the pickle is saved or copied using a non-binary transfer mode (e.g. resaved with a text editor on Windows, copied using ASCII FTP transfer, saved as a text file from a website, etc.), the pickle will be corrupted with the addition of CR characters.

Now, the problem is this line in pickle.py:

module = self.readline()[:-1]

If the file you give the Unpickler is opened 'rb' but contains CRLF, then these lines will read module = "photo_data\r", which isn't a valid module name. Upon importing, the error will appear as

ImportError: No module named photo_data

with an unprinted carriage return after the photo_data (dreadfully sneaky!).

The solution is to ensure you transfer files in a binary fashion, and don't run unix2dos or any similar utilities on the pickle. Alternately, if using protocol 0 (text) pickles, it is safe to use 'rU' (universal newline mode) to open the pickled files instead.

See also Pickled file won't load on Mac/Linux.

Community
  • 1
  • 1
nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • That's it! I tried on all of the file variants I used, and `'rU'` fixes the problem. And you are right, `unix2dos` does _not_ fix the problem. I changed from using `FTP` to `winscp` a few weeks ago - I wonder if that's the issue. What do you use for guaranteed binary transfer between systems? And thanks a million - I've wasted many hours on this... – labroid Sep 16 '12 at 11:54
  • 1
    Just to share a little more that I learned: `winscp` changes transfer mode depending on the file extension. I had arbitrarily added `.txt` to the end of the pickle file name so I could open it automatically with a text editor, only to learn that `winscp` has a configurable list of file types that it transfers in ASCII mode instead of binary mode. So changing the file extension silently changed the transfer mode. Live and learn. Thanks again for all of your help! – labroid Sep 16 '12 at 14:05
1

I had a similar issue today: an object that was pickled on linux failed to load on windows due to CRLF line endings, as @nneonneo explain in his answer.

The problem was git thinking that the .pkl objects are text files, and that it should normalize the line endings. So when I pushed the project with the binary .pkl files on the linux machine, and pull them on the windows machine, the line ending was CRLF instead of unix line ending. The solution was to add a .gitattributes file to the repo with

*.pkl binary

to force git not to manipulate the files at all.

Then, you can "refresh" the changes as suggested here. I deleted and cloned the project again on the windows machine instead.

Nagasaki45
  • 2,634
  • 1
  • 22
  • 27