2

Below is the code I am using to read gz file

import json
import boto3
from io import BytesIO
import gzip

def lambda_handler(event, context):
try:
 s3 = boto3.resource('s3')
 key='test.gz'
 obj = s3.Object('athenaamit',key)
 n = obj.get()['Body'].read()
 #print(n)
 gzip = BytesIO(n)
 gzipfile = gzip.GzipFile(fileobj=gzip)
 content = gzipfile.read()
 print(content)
 return 'dddd'

except Exception as e: print(e) raise e But I am gettting below error

 "errorMessage": "'_io.BytesIO' object has no attribute 'GzipFile'",
 "stackTrace": [
 "  File \"/var/task/lambda_function.py\", line 20, in lambda_handler\n    raise e\n",
"  File \"/var/task/lambda_function.py\", line 14, in lambda_handler\n    gzipfile = gzip.GzipFile(fileobj=gzip)\n"

python version -3.7

I also tried to implement below suggestion https://stackoverflow.com/questions/32794837/pass-io-bytesio-object-to-gzip- gzipfile-and-write-to-gzipfile

but its also not working for me, kindly suggest how I can read content of file

Amit
  • 497
  • 3
  • 8
  • 24
  • 2
    You have a conflict in your naming conventions. Change the variable name assignment for `gzip = BytesIO(n)` to a different variable name. As written you are overwriting the functionality of the `gzip` module by naming a variable `gzip` in your code. – vielkind Nov 30 '18 at 13:25
  • @vealkind thanks that is a silly mistake – Amit Nov 30 '18 at 13:54
  • Does this answer your question? [Reading contents of a gzip file from a AWS S3 in Python](https://stackoverflow.com/questions/41161006/reading-contents-of-a-gzip-file-from-a-aws-s3-in-python) – Victor Sergienko Aug 10 '22 at 00:18

1 Answers1

6

Finishing this into a proper answer. The working code would be:

s3 = boto3.resource('s3')
obj = s3.Object('my-bucket-name','path/to/file.gz')
buf = io.BytesIO(obj.get()["Body"].read()) # reads whole gz file into memory
for line in gzip.GzipFile(fileobj=buf):
    # do something with line

I was a bit worried about memory footprint, but it seems that only the gz file is kept in memory (line 3 above). And then only every single line in unzipped form in the for line loop.

With a gz file of 38M I had a memory footprint of 47M (in virtual memory, VIRT in htop). The unzipped file was 308M.

hansaplast
  • 11,007
  • 2
  • 61
  • 75