How do I download compressed content on a S3 drive as plain text?

Question

I have HTML data stored on a S3 bucket served as a static site which is stored in gzipped form. Though I can access my S3 site correctly and the browser knows to uncompress it, I can't download it using AWS CLI because the raw data that gets downloaded remains gzipped (rather than decompressing after the download) even after being copied to disk and thus comes out garbled when opened via text editor or browser.

I've tried to explicitly pass the content-encoding to convert the gzipped content on S3 to plain text but the file that gets downloaded still appears to have the gzipped bytes rather than the raw UTF-8. Here is the command I've tried:

aws s3 cp s3://mys3bucket.com/index.html ./test.html --content-encoding "gzip" --content-type "text/html"

score 2 · Accepted Answer · answered Jan 08 '20 at 15:12

After downloading (or while) you can uncompress the data yourself. If you are using a Unix variant, this will be done by piping the output into zcat like this:

aws s3 cp s3://mys3bucket.com/index.html ./test.html --content-encoding "gzip" --content-type "text/html" | zcat

You can also store the data in a file and later uncompress it.

It would be nonsense to uncompress it on the S3 side because then you'd have to transmit way more data (the uncompressed version).

score 0 · Answer 2 · answered Sep 02 '21 at 06:10

This is how I am downloading (from aws s3) and uncompressing files in python.

uncompress.py

import os
import sys
import gzip

#uncompress downloaded folder ( gzip files)
#it overwrittes same downloaded folder no need to create separate folder to store uncompressed files

def unCompress(ROOT):
    for entry in os.listdir(ROOT):
        path = os.path.join(ROOT,entry)
        if os.path.isdir(path):
            unCompress(path)
        else:
             data = ''
             with open(path,'rb') as f:
                 data = f.read()
             with open(path,'wb') as f:
                 f.write(gzip.decompress(data))

main.py

import os
from uncompress import unCompress

FOLDER_NAME = "myProject"            #folder to download from aws s3
LOCAL_PATH= "./downloads/"+FOLDER_NAME      #local path to store downloaded files

cmd = ("aws s3 cp s3://bucketName/"+FOLDER_NAME+" "+LOCAL_PATH+" --recursive --quiet")
result = os.system(cmd)

if result != 0:
   print('Error')
else:
   unCompress(LOCAL_PATH)  #uncompress downloaded files

How do I download compressed content on a S3 drive as plain text?

2 Answers2