4

I want to zip a stream and stream out the result. I'm doing it using AWS Lambda which matters in sense of available disk space and other restrictions. I'm going to use the zipped stream to write an AWS S3 object using upload_fileobj() or put(), if it matters.

I can create an archive as a file until I have small objects:

import zipfile
zf = zipfile.ZipFile("/tmp/byte.zip", "w")
zf.writestr(filename, my_stream.read())
zf.close()

For large amount of data I can create an object instead of file:

from io import BytesIO
...
byte = BytesIO()
zf = zipfile.ZipFile(byte, "w")
....

but how can I pass the zipped stream to the output? If I use zf.close() - the stream will be closed, if I don't use it - the archive will be incomplete.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
Putnik
  • 5,925
  • 7
  • 38
  • 58

2 Answers2

8

Instead of using Python't built-in zipfile, you can use stream-zip (full disclosure: written by me)

If you have an iterable of bytes, my_data_iter say, you can get an iterable of a zip file using its stream_zip function:

from datetime import datetime
from stream_zip import stream_zip, ZIP_64

def files():
    modified_at = datetime.now()
    perms = 0o600
    yield 'my-file-1.txt', modified_at, perms, ZIP_64, my_data_iter

my_zip_iter = stream_zip(files())

If you need a file-like object, say to pass to boto3's upload_fileobj, you can convert from the iterable with a transformation function:

def to_file_like_obj(iterable):
    chunk = b''
    offset = 0
    it = iter(iterable)

    def up_to_iter(size):
        nonlocal chunk, offset

        while size:
            if offset == len(chunk):
                try:
                    chunk = next(it)
                except StopIteration:
                    break
                else:
                    offset = 0
            to_yield = min(size, len(chunk) - offset)
            offset = offset + to_yield
            size -= to_yield
            yield chunk[offset - to_yield:offset]

    class FileLikeObj:
        def read(self, size=-1):
            return b''.join(up_to_iter(float('inf') if size is None or size < 0 else size))

    return FileLikeObj()

my_file_like_obj = to_file_like_obj(my_zip_iter)
Michal Charemza
  • 25,940
  • 14
  • 98
  • 165
  • 2
    Thanks for writing this library, super helpful and exactly what I was looking for! – michael May 27 '22 at 16:11
  • stream-zip is an amazing library. Super powerful and fast. I recommended it in another question at https://stackoverflow.com/a/76529180/8874388 for anyone who wants to know more about it. – Mitch McMabers Jun 22 '23 at 09:53
4

You might like to try the zipstream version of zipfile. For example, to compress stdin to stdout as a zip file holding the data as a file named TheLogFile using iterators:

#!/usr/bin/python3
import sys, zipstream
with zipstream.ZipFile(mode='w', compression=zipstream.ZIP_DEFLATED) as z:
    z.write_iter('TheLogFile', sys.stdin.buffer)
    for chunk in z:
        sys.stdout.buffer.write(chunk)
meuh
  • 11,500
  • 2
  • 29
  • 45
  • The key here "holding the data as a file". I do not want to use a file due to environment limitations. How should it look then? – Putnik Apr 04 '19 at 13:12
  • I wasn't clear. I just meant that the final output is a stream that, were you to save it to a file, would appear to be a zipfile. If you were to unzip it, you would get a file called `TheLogFile` containing whatever data you read from stdin. The only file is the nominal one that is part of the output stream format. Look at the `webpy` example at the end of the link, as that seems to be similar to your situation. – meuh Apr 04 '19 at 13:56
  • 1
    got it, thank you. Another question: looks like indentation is a bit messy, does `with zipstream ...` contains only `z.write_iter...` only or `for chunk...` too? – Putnik Apr 04 '19 at 17:55