I've connected a lambda to a DyDB table via a stream. When a record is written to the table, it triggers the lambda. The traffic is very bursty, so nothing might happen for a while, then I'll write several thousand records.
What I'm seeing is a few lambda instances will be triggered, but not enough to handle the burst. Then at random times, the number of lambda instances will jump an order of magnitude or two (from 2 to 90 or more), and it will catch up. The problem is the jump might not occur for 30 minutes or more.
I'm seeing the records written to the table very quickly (seconds). The processing of 20 records by the lambda shouldn't take more than 2 minutes. It seems like the lambdas are spending most of their time sitting around waiting for records to show up. The record key for the table is a GUID.
Things I've tried
- Playing with the number of records to make sure there's no lambda timeouts (20 seems to be conservative, but 100 causes timeouts)
- Moving the lambda to a different subnet
- Batching the writes to the table (~500-1000 records in a batch)
- Breaking up the writes in hopes it would trigger more lambdas (~20-100 records in a batch)
- Increasing the lambda memory to the max (3GB)
- Reducing memory to be larger than used (1GB, 300Mb used)
Is there a better pattern to be using? Should I skip the stream and just write SNS messages? I don't care about order, but would prefer to not run the job more than once.