How would you go about organizing a process of zipping objects that reside an object storage?
For context, our users sometimes request an extraction of their entire data from the app - think of "Downloading Twitter archive" feature of Twitter.
Our users are able to upload files, so the extracted data must contain files stored in a object storage (Google Cloud Storage). The requested data must be packed into a single .zip archive.
A naive approach would look like this:
- download all files from object storage on a disk,
- zip all files into an archive,
- put it .zip back on an object storage,
- send a link to download the .zip file back to user.
However, there are multiple disadvantages here:
- sometimes files for even single user add up to gigabytes,
- if the process of zipping is interrupted, it has to start over.
What's a reasonable way to design a process of generating a .zip archive with user files, that originally reside on an object storage?