4

I have seen several differing opinions on this.

I don't see anything in the latest docs (3.9.2).

Can I read multiple different entries in a ZipFile safely?

I have seen some unusual errors like "Error -3 while decompressing data: invalid stored block lengths," and I am wondering if they are because I'm reading entries in parallel.

EDIT: Please don't close it as duplicate of Is python zipfile thread-safe? . If you read only the title, you'd think it's a duplicate. But if you read the actual question, it asks about writing zip files (even though writing zip files is inherently not really parallelizable). This question asks about reading zip files.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Paul Draper
  • 78,542
  • 46
  • 206
  • 285

2 Answers2

0

By the looks of it, at least it's planned to be thread-safe: actual data I/O goes through a _SharedFile object, which uses the ZipFile-level lock for reading, maintaining a private position for itself:

def read(self, n=-1):
    with self._lock:
        if self._writing():
            raise ValueError("Can't read from the ZIP file while there "
                    "is an open writing handle on it. "
                    "Close the writing handle before trying to read.")
        self._file.seek(self._pos)
        data = self._file.read(n)
        self._pos = self._file.tell()
        return data

you can try looking at _seekable of the ZipFile, but normally it's going to be True.

tevemadar
  • 12,389
  • 3
  • 21
  • 49
0

At the time of this question, CPython's reading ZipFile was not threadsafe. https://bugs.python.org/issue42369

However, it has since been fixed and backported to 3.9 and on https://github.com/python/cpython/pull/26974

Paul Draper
  • 78,542
  • 46
  • 206
  • 285