I have a use case to write thousands of items into a dynamodb table(a lookup table) from a dataframe in AWS Glue. I also want the entire write transaction to fail/rollback in case of AWS Glue job fails. This is to ensure we have a cleaner re-start mechanism. I have the below questions and want to understand if I am on the right track:
I see that the actions within transact_write_items are atomic, but since this call can only support 25 items at a time, is this truly atomic? Say if my first 4 sets of 25 PUT items succeed and the 5th set fails, isn't that only the 5th set fails insertion completely. I would still end up with 100 items in the dynamodb table despite the failure?
I am surprised that I cannot find a simple way to pass my dataframe rows to transact_write_items in 25 row/item batches. Has someone tried this before? Below is a sample I am using but I reckon it's doing one row at a time
try:
for index, row in df_read.iterrows():
print(row)
cli.transact_write_items(
TransactItems=[{
'Put':{
"TableName": dynamodb_table_name,
'Item': {
'ID': row.ID,
'MARK': row.MARK
}
}
}
])
except botocore.exceptions.ClientError as e:
print(e)