2

I have a use case to write thousands of items into a dynamodb table(a lookup table) from a dataframe in AWS Glue. I also want the entire write transaction to fail/rollback in case of AWS Glue job fails. This is to ensure we have a cleaner re-start mechanism. I have the below questions and want to understand if I am on the right track:

  1. I see that the actions within transact_write_items are atomic, but since this call can only support 25 items at a time, is this truly atomic? Say if my first 4 sets of 25 PUT items succeed and the 5th set fails, isn't that only the 5th set fails insertion completely. I would still end up with 100 items in the dynamodb table despite the failure?

  2. I am surprised that I cannot find a simple way to pass my dataframe rows to transact_write_items in 25 row/item batches. Has someone tried this before? Below is a sample I am using but I reckon it's doing one row at a time

    try:
        for index, row in df_read.iterrows():
            print(row)
            cli.transact_write_items(
            TransactItems=[{
                'Put':{
                    "TableName": dynamodb_table_name,
                    'Item': {
                            'ID': row.ID,
                            'MARK': row.MARK
                    }
                }
            }
            ])
    except botocore.exceptions.ClientError as e:
        print(e)

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/dynamodb.html#DynamoDB.Client.transact_write_items

Prabhu
  • 43
  • 1
  • 7

1 Answers1

0
  1. You are correct. Adding large amounts of data will be an eventually consistent operation and you are responsible for handling rollback logic appropriate for your use case.

You probably want to use BatchWrite instead of TransactWrite to save some money and speed up the processing time.

  1. Conditionally call BatchWrite (Or TransactWrite) when modulo 25 == 0 on the index, or the index is the final item.
Ross Williams
  • 507
  • 1
  • 9