0

I am using Lambdas and SQS queue to delete the data from DynamoDB. Earlier when I was developing this I found that the only way to delete data from DyanmoDB is to gather the data you want to delete and deleting them in Batches. At my current organization, most of the infrastructure is in serverless. Hence, I decided to make this piece following serverless and event driven architecture as well. In a nutshell, I post a message on the SQS queue to delete items under particular partition. Once this message invokes my Lambda, I perform the listing call to DyanmoDB for 1000 items and do the following:

  • Grab the cursor from this listing call, and post another message to grab next 1000 items from this cursor.
import {  DynamoDBClient } from '@aws-sdk/client-dynamodb';
const dbClient = new DynamoDBClient(config);
const records = dbClient.query(...fetchFirst1000ItemsForPrimaryKey);
postMessageToFetchNextItems();
  • From the fetched 1000 items:
    • I create a batches of 20 items, and issue set of messages for another lambda to delete these items. A batch of 20 items is posted for deletion until all 1000 have been posted for deletion.

      for (let i = 0; i < 1000; i += 20) {
        const itemsToDelete = records.slice(i, 20);
        postItemsForDeletion(itemsToDelete);
      }
      
  • Another lambda gets these items and just deletes them:
dbClient.send(new BatchWriteItemCommand([itemsForDeletion]))
  • The listing lambda receives call to read items from next cursor and the above steps ge t repeated.

This all happens in parallel. Get items, post message to grab next 1000 items, post messages for deletion of items.

While looking good on paper, this doesn't seem to delete all records from DynamoDB. There is no set pattern, there are always some items that remain in the DynamoDB. I am not entirely sure what could be happening but have a theory that parallel deletion and listing could be something that is causing the issue? I was unable to find any documentation to verify my theory and hence this question here.

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
abhinav
  • 607
  • 6
  • 18

1 Answers1

0

A batch write items call will return a list of unprocessed items. You should check for that and retry them.

Look at the docs for https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-dynamodb/classes/batchwriteitemcommand.html and seach for UnprocessedItems.

Fundamentally, a batch write items call is not a transactional write. It's possible for some item writes to succeed while others fail. It's on you to check for failures and retry them. I'm sorry I don't have a link for good sample code.

hunterhacker
  • 6,378
  • 1
  • 14
  • 11
  • Thanks for your reply. I am already checking if there are any `UnprocessedItems` and have added code to log the error in that case. But unfortunately, there were no such errors. – abhinav Sep 01 '22 at 17:37