0

I have hereby attached my hardcoded python program which appends two JSON files in the S3 storage to be appended manually. Can someone please tell me how to get multiple input files (JSON files) from the S3 bucket automatically. I know we can do the same in python using *json in the directory of the program but I don't understand how to do the same in AWS Lambda.

Python Code:

import glob

result = []
for f in glob.glob("*.json"):
    with open(f, "r") as infile:
        result += json.load(infile)

with open("merge.json", "w") as outfile:
    json.dump(result, outfile)

For doing the same in lambda I am able to do it for like 2 files, can someone please suggest how to do the same (like taking all JSON files from S3 automatically) in lambda. Thanks in advance.

import boto3
import json

s3_client = boto3.client("s3")
S3_BUCKET = 'bucket-for-json-files'

def lambda_handler(event, context):
  object_key = "sample1.json"  # replace object key
  file_content = s3_client.get_object(Bucket=S3_BUCKET, Key=object_key)["Body"].read()
  print(file_content)
  object_key2 = "sample2.json"  # replace object key
  file_content2 = s3_client.get_object(Bucket=S3_BUCKET, Key=object_key2)["Body"].read()
  print(file_content2)
  result = []
  result += json.loads(file_content)
  result += json.loads(file_content2)
  print(result)
  

Have followed the syntax from the documentation but I still get the timeout error.

import boto3

# Create a client
client = boto3.client('s3', region_name='us-east-1')

# Create a reusable Paginator
paginator = client.get_paginator('list_objects')

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket='bucket-for-json-files')

for page in page_iterator:
    print(page['Contents'])

Getting a timeout error:

import boto3

s3_client = boto3.client("s3")
S3_BUCKET = 'bucket-for-json-files'

def iterate_bucket_items(S3_BUCKET):
  client = boto3.client('s3')
  paginator = client.get_paginator('list_objects_v2')
  page_iterator = paginator.paginate(Bucket=S3_BUCKET)
  
  for page in page_iterator:
    if page['KeyCount'] > 0:
      for item in page['Contents']:
        yield item

for i in iterate_bucket_items(bucket='S3_BUCKET'):
  print (i)

Have solved the issue with the help of @JeremyThompson, will attach my final code here:

import json
import boto3
import glob

def lambda_handler(event, context):
  s3 = boto3.resource('s3')
  bucket = s3.Bucket('bucket-for-json-files')
# Create a client
client = boto3.client('s3', region_name='us-east-1')

# Create a reusable Paginator
paginator = client.get_paginator('list_objects')

# Create a PageIterator from the Paginator
page_iterator = paginator.paginate(Bucket='bucket-for-json-files')

result = [] 

for page in page_iterator:
    result += page['Contents']

s3 = boto3.client('s3')
bucket = 'bucket-for-json-files'

merge = [] 
lst = []

for i in result:
  cmd = i['Key']
  print(cmd)   

The above code prints the key from each json file available in the user's bucket.

0 Answers0