3

I'm trying to improve the performance of a lambda function that is used to write thousands of items to a dynamoDB table. I came across a suggestion to use the aiboto3 library for async writes, but I'm not seeing an improvement with my current implementation (see simplified code below).

I found a question (aioboto3 speedup not as expected) that seemed similar but it's not related to dynamoDB writes, and I don't think the event loop is being overloaded in my case because batch_writer sends batches of 25 so even for 10,000 writes there would only be 400 batches sent.

This is my first time using aioboto3/asyncio and I'm new to async concepts so if I'm completely off-base here or there's a better way to improve performance, please let me know. Any nudges in the right direction would be much appreciated!

async def putMessages(msgs):
    async with aioboto3.resource('dynamodb', region_name='us-east-1') as dynamo_resource:
        table = await dynamo_resource.Table('Messages')

        async with table.batch_writer() as dynamo_writer:
            for msg in msgs:
                await dynamo_writer.put_item(Item= {
                    'msgID': msg['msgID'],
                    'msgType' : msg['msgType']
                    }
                )

def lambda_handler(event, context):
    msgs = getMessages()

    loop = asyncio.get_event_loop()
    loop.run_until_complete(putMessages(msgs))

Edit 06.27.2020:
Big improvement performance-wise by using asyncio.gather on a list of futures... but now I'm having an issue where only some of the writes are making it through. Tested this at different volumes and was seeing write success results like ~6000 / 10000, ~3000 / 4000, ~400 / 500. I don't think it was due to write throttling either... increased my write capacity units and saw similar results with zero throttles. Any idea what might be causing this? Here's the updated code:

Edit 06.27.2020:
I took another look at the cloudwatch metrics and realized there actually was some throttling even with the increased WCU. But the provisioned capacity was never exceeded and I don't think it could be a hot partition issue because provisioned WCU is less than the 1000 per partition limit anyway... so I'm not really sure what's going on here.

async def putMessages(msgs):
    async with aioboto3.resource('dynamodb', region_name='us-east-1') as dynamo_resource:
        table = await dynamo_resource.Table('Messages')

        async with table.batch_writer() as dynamo_writer:
            writes = []
            for msg in msgs:
                future = asyncio.ensure_future(
                            dynamo_writer.put_item(Item= {
                                'msgID': msg['msgID'],
                                'msgType' : msg['msgType']
                                }))
                writes.append(future)
            await asyncio.gather(*writes)

def lambda_handler(event, context):
    msgs = getMessages()

    loop = asyncio.get_event_loop()
    loop.run_until_complete(putMessages(msgs))
  • 2
    I'm not a JavaScript programmer, but it looks like you're waiting for `dynamo_writer.put_item()` to complete before sending the next message. – Parsifal Jun 26 '20 at 16:17

0 Answers0