Edit: Here's code that works for me with Python 2.7 but not with 3.6 (a bit of a mystery, it seemed to work earlier this evening):
$ cat zipf.py
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
content = "content"
info = ZipInfo()
info.filename = "file.txt"
info.flag_bits = 0x800
# don't set info.file_size here: zf.writestr() does that
zf.writestr(info, content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
Run as:
$ python2.7 zipf.py
50 4b 03 04 14 00 00 08
but:
$ python3.6 zipf.py
50 4b 03 04 14 00 00 00
It's certainly possible to make it work, by making sure the file is opened before creating the info
entry. However, then you must avoid writestr
, and this only works with Python 3.6 (and seems rather abusive):
from __future__ import print_function
from zipfile import ZipFile, ZipInfo
with ZipFile("out.zip", 'w') as zf:
info = ZipInfo()
info.filename = "file.txt"
content = "content"
if not isinstance(content, bytes):
content = content.encode('utf8')
info.file_size = len(content)
with zf.open(info, 'w') as stream:
info.flag_bits = 0x800
stream.write(content)
with open('out.zip', 'rb') as stream:
byteseq = stream.read(8)
for i in byteseq:
if isinstance(i, str): i = ord(i)
print('{:02x}'.format(i), end=' ')
print()
It's probably the case that 3.6 resetting all the info.flag_bits
(through the internal open
that it does) is just incorrect, although it's not really clear to me.
Original answer below
I cannot reproduce this, but you're right that bit 11 in the flag bits is set if the file name is Unicode and encoding as ASCII fails:
def _encodeFilenameFlags(self):
if isinstance(self.filename, unicode):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
else:
return self.filename, self.flag_bits
(Python 2.7 zipfile.py source) or:
def _encodeFilenameFlags(self):
try:
return self.filename.encode('ascii'), self.flag_bits
except UnicodeEncodeError:
return self.filename.encode('utf-8'), self.flag_bits | 0x800
(Python 3.6 zipfile.py source).
To get the bit set you need a filename that cannot be encoded directly in ASCII, e.g.:
info.filename = u"sch\N{latin small letter o with diaeresis}n" # "file.txt"
(this notation works with both Python 2.7 and 3.6).
I tried to force this bit on by setting the flag after creating the ZipInfo object, but it gets reset back to 0x00 in _open_to_write().
If I add:
info.filename = "file.txt"
info.flag_bits |= 0x0800
(just after setting the filename to u"schön"
) and run this under Python 2.7 or 3.6, I get the bit set in the header (of course the file name in the zip directory changes back to file.txt
).