2

I'm trying to add several files to a zip with Python's zipfile library. The problem is in the filename that is zipped, which contains special characters (utf-8).

Here is a basic code :

#!/usr/bin/env python

import zipfile

infilename = "test_file"
outfilename = "test.zip"
filename = u'Conf\xe9d\xe9ration.txt'

if __name__ == '__main__':
    f = open(outfilename, "w")
    archive = zipfile.ZipFile(f, "w", zipfile.ZIP_DEFLATED)
    archive.write(infilename, filename.encode("CP437"))
    archive.close()
    f.close()

The file generated is not correctly read with every zip extractor :

  • Ubuntu 10.04 & 11.10 : Conf?d?ration.txt
    File could not be extracted : "caution: filename not matched: Conf?d?ration.txt"

  • Windows XP & 7 : Confédération.txt
    File could be read

  • MacOSX (Lion) : ConfÇdÇration.txt
    File could be read

I tried without encoding to CP437 changing just one line to :

    archive.write(infilename, filename)

This time Ubuntu has still the same problem, Windows gives "Conf+®d+®ration.txt" and MacOSX works perfectly.

Someone knows a (pythonic) cross-plateform solution?

Dharman
  • 30,962
  • 25
  • 85
  • 135
samb
  • 1,713
  • 4
  • 22
  • 31

1 Answers1

1

Looks like file name is written "as it is" (i.e. first time it is written in CP437 encoding, and second - in UTF8), while other archive handlers use different approach:

  • Windows : it uses DOS/OEM encoding for file names inside of archive, that's why CP437 works. And, this behavior is described in PKWare standard;
  • Mac OS : it silently uses utf-8, which violates standard. And that's why utf8 works in Mac OS.
  • Linux/Unix: they use system code page for file names inside of archive, don't know to which one your Linux installation is configured, but not for DOS, and not for UTF8 encoding :)
Nickolay Olshevsky
  • 13,706
  • 1
  • 34
  • 48