We have a QLDB ingestion process that consists of a Lambda function triggered by SQS.
We want to make sure our pipeline is airtight so if a failure or error occurs during driver execution, we don't lose that data if the data fails to commit to QLDB.
In our testing we noticed that if there's a failure within the Lambda itself, it automatically resends the message to the queue, but if the driver fails, the data is lost.
I understand that the default behavior for the driver is to retry four times after the initial failure. My question is, if I wrap qldb_driver.execute_lambda()
in a try statement, will that allow the driver to retry upon failure or will it instantly return as a failure and be handled by the except statement?
Here is how I've written the first half of the function:
import json
import boto3
import datetime
from pyqldb.driver.qldb_driver import QldbDriver
from utils import upsert, resend_to_sqs, delete_from_sqs
queue_url = 'https://sqs.XXX/'
sqs = boto3.client('sqs', region_name='us-east-1')
ledger = 'XXXXX'
table = 'XXXXX'
qldb_driver = QldbDriver(ledger_name = ledger, region_name='us-east-1')
def lambda_handler(event, context):
# Simple iterable to identify messages
i = 0
# Error flag
error = False
# Empty list to store message send status as well as body or receipt_handle
batch_messages = []
for record in event['Records']:
payload = json.loads(record["body"])
payload['update_ts'] = str(datetime.datetime.now())
try:
qldb_driver.execute_lambda(lambda executor: upsert(executor, ledger = ledger, table_name = table, data = payload))
# If the message sends successfully, give it status 200 and add the recipt_handle to our list
# so in case an error occurs later, we can delete this message from the queue.
message_info = {f'message_{i}': 200, 'receiptHandle': record['receiptHandle']}
batch_messages.append(message_info)
except Exception as e:
print(e)
# Flip error flag to True
error = True
# If the commit fails, set status 400 and add the message's body to our list.
# This will allow us to send the message back to the queue during error handling.
message_info = {f'message_{i}': 400, 'body': record['body']}
batch_messages.append(message_info)
i += 1
Assuming that this try/except allows the driver to retry upon failure, I've written an additional process to record message data from our batch to delete successful commits and send failures back to the queue:
# Begin error handling
if error:
count = 0
for j in range(len(batch_messages)):
# If a message was sent successfully delete it from the queue
if batch_messages[j][f'message_{j}'] == 200:
receipt_handle = batch_messages[j]['receiptHandle']
delete_from_sqs(sqs, queue_url, receipt_handle)
# If the message failed to commit to QLDB, send it back to the queue
else:
body = batch_messages[j]['body']
resend_to_sqs(sqs, queue_url, body)
count += 1
print(f"ERROR(S) DETECTED - {count} MESSAGES RETURNED TO QUEUE")
else:
print("BATCH PROCESSING SUCCESSFUL")
Thank you for your insight!