How to zip files in Amazon s3 Bucket and get its URL

Question

I have a bunch of files inside Amazon s3 bucket, I want to zip those file and download get the contents via S3 URL using Java Spring.

Could you please clarify your requirements? What do you mean by "zip those amazon URLs into Zip"? Do you mean you wish to create a new object in an Amazon S3 bucket that consists of a list of URLs? Or do you wish to create a Zip file from several existing files? Please Edit your question to provide more information so that we can assist you. — John Rotenstein, Apr 07 '17 at 10:52
Sir, I have huge size files in Amazon s3 bucket. I just want to create a Zip file from those files and get as a single file directly from bucket — jeff ayan, Apr 07 '17 at 13:19

mootmoot · Accepted Answer · 2020-10-13T11:33:24.660

S3 is not a file server, nor does it offer operating system file services, such as data manipulation.

If there is many "HUGE" files, your best bet is

start a simple EC2 instance
Download all those files to EC2 instance, compress them, reupload it back to S3 bucket with a new object name

Yes, you can use AWS lambda to do the same thing, but lambda is bounds to 900 seconds (15 mins) execution timeout (Thus it is recommended to allocate more RAM to boost lambda execution performance)

Traffics from S3 to local region EC2 instance and etc services is FREE.

If your main purpose is just to read those file within same AWS region using EC2/etc services, then you don't need this extra step. Just access the file directly.

(Update) : As mentioned by @Robert Reiz, now you can also use AWS Fargate to do the job.

Note :

It is recommended to access and share file using AWS API. If you intend to share the file publicly, you must look into security issue seriously and impose download restriction. AWS traffics out to internet is never cheap.

Lambda execution timeout settings can be set up to 15 mins not 300 seconds as I can see on dashboard. — pankaj, Mar 03 '19 at 20:42
EC2 ist one of the most expensive services on AWS. I would recommend ECS Fargate, because it has all the advantages of EC2, but costs much less. If you need to run these kind of tasks regulary you can even create a scheduled task on ECS Fargate, which will trigger a Docker container every X hours or days. — Robert Reiz, Sep 10 '20 at 09:03

javrd · Answer 2 · 2022-09-04T11:00:30.323

Zip them in your end instead of doing it in AWS, ideally in frontend, directly on user browser. You can stream the download of several files in javascript, use that stream to create a zip and save this zip on user disk.

The advantages of moving the zipping to the frontend:

You can use it with S3 URLs, a bunch of presigned links or even mixing content from different sources, some from S3, some of whatever other place.
You don't waste lambda memory, nor have to up an EC2 fargate instance, that saves money. Let the user computer do it for you.
Improves user experience - no needs to wait the zip is created to start downloading it, just start downloading meanwhile the zip is being created.

StreamSaver is useful for this purpose, but in their zipping examples (Saving multiple files as a zip) is limited by less than 4GB files as it doesn't implement zip64. You can combine StreamSaver with client-zip, that support zip64, with something like this (I haven't test this):

import { downloadZip } from 'client-zip';
import streamSaver from 'streamsaver';
const files = [
  {
    'name': 'file1.txt',
    'input': await fetch('test.com/file1')
  },
  {
    'name': 'file2.txt',
    'input': await fetch('test.com/file2')
  },
]
downloadZip(files).body.pipeTo(streamSaver.createWriteStream('final_name.zip'));

In case you choose this option, keep in mind that if you have CORS enabled in your bucket you will need to add the frontend url where the zipping is done, right in the AllowedOrigins field from your CORS configuration of your bucket.

About performance: As @aviv-day complains in a comment this could not be suitable for all scenarios. Client-zip library has a benchmark that can give you an idea if this fit or not with your scenario. Generally, if you have a big set of small files (I don't have a number about what is big here, but I'll say something between 100 and 1000) it will take a lot of time just zipping it, and it will drain the final user CPU. Also, if you are offering the same set of files zipped for all the users, it's better zip it one and present it already zipped. Using this method of zipping in frontend works well with a limited small group of files that can dynamically change depending on user preferences about what to download. I've no really test this and I really think the bottle neck would be the network speed more than the zip process, as it happens on the fly, I don't really think that scenario with a big set of files would actually be a problem. If anyone have benchmarks about this would be nice to share with us!

Of course it depends on your scenario, and you are right, it doesn't fit with the scenario that you describe. It does with multiple other scenarios where the user just dynamically choose a bunch of file to download that are more limited. For example an invoice scenario where you can choose one per month in the range of years the user wants, or a whatever scenario that you have a list of 10 files, 1 gb per file and the user selects which ones to download. I'll edit te answer to be more specific on that. — javrd, Sep 04 '22 at 10:47

score -1 · Answer 3 · answered Sep 18 '20 at 04:23

Hi I recently have to do that for my application -- serve a bundle of files in zip format through a url link that the users can download.

In a nutshell, first create an object using BytesIO method, then use the ZipFile method to write into this object by iterating all the s3 objects, then use put method on this zip object and create a presiged url for it.

The code I used looks like this:

First, call this function to get the zip object, ObjectKeys are the s3 objects that you need to put into the zip file.


def zipResults(bucketName, ObjectKeys):
    buffer = BytesIO()
    with zipfile.ZipFile(buffer, 'w', compression=zipfile.ZIP_DEFLATED) as zip_file:
        for ObjectKey in ObjectKeys:
            objectContent = S3Helper().readFromS3(bucketName, ObjectKey)
            fileName = os.path.basename(ObjectKey)
            zip_file.writestr(fileName, objectContent)

    buffer.seek(0)
    return buffer

Then call this function, key is the key you give to your zip object:

def uploadObject(bucketName, body, key):
    s3client = AwsHelper().getClient("s3")
    try:
        response = s3client.put_object(
            Bucket=bucketName,
            Body=body,
            Key=key
        )
    except ClientError as e:
        logging.error(e)
        return None

    return response

Of course, you would need io, zipfile and boto3 modules.

score -2 · Answer 4 · answered Sep 10 '18 at 23:06

-2

If you need individual files (objects) in S3 compressed, then it is possible to do so in a round-about way. You can define a CloudFront endpoint pointing to the S3 bucket, then let CloudFront compress the content on the way out: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/ServingCompressedFiles.html

answered Sep 10 '18 at 23:06

Frode N. Rosand

193
5

This won't work on files larger than 10MB. Is there any other automated way to serve compressed files on AWS? – digitaldavenyc Oct 02 '19 at 22:43
Just spitballing here, but you could create an API gateway, send a request to a lambda function that could process the files (I think you're granted 5GB tmp space to do file processing), copy the archive back to the s3 bucket via lambda, determine that path, and return the download url of that path as the response to the client (via the gateway). – Andrew Jan 28 '20 at 15:39
Sorry, should 500MB tmp space, not 5GB, although one training I did said 5GB.... Never tested, so don't know what'd happen... – Andrew Jan 28 '20 at 15:52

How to zip files in Amazon s3 Bucket and get its URL

4 Answers4

Linked