2

I am designing a database that could easily be represented as a large collection of files containing fixed size records, with sequence numbers 0,1,... This could fit nicely in DynamoDB with the filename as primary key and the record sequence number as sort key, but I am thinking about just using loose files on EFS. I don't need any replication, as this is already a replicate in a fault tolerant system. My Lambda function won't need any fancier operations than to read, write or update an individual record, which would always be at known offset in a known file. There may be 100's of simultaneously active lambdas, but usually accessing different files. It looks like I can use fcntl/lockf to synchronize any contention.

Back of envelope, it seems like using raw files will cut cost in half, at least, and I'm guessing will perform better also. What are some reasons I might regret doing this?

user1055568
  • 131
  • 2

2 Answers2

0

I guess it should be possible. This blog post has some useful info…

https://aws.amazon.com/blogs/compute/using-amazon-efs-for-aws-lambda-in-your-serverless-applications/

MLu
  • 24,849
  • 5
  • 59
  • 86
0

Based on some simple experiments, this approach is feasible. Using Rust, I was able to update a file in around 20ms Lambda runtime (on repeat invocations). Using fcntl advisory locking (F_SETLKW) I had no problem with a handful of concurrent Lambdas contending for the same file. Latency went up to around 30ms with no contention and the expected wait times during contention.

Seems like these are in the same ballpark as dynamoDB. However, I have seen recommendations in many places to avoid EFS for workloads involving many small files. But, I guess that is often relative to EBS, and ignoring the simple concurrency I can achieve with 1 Lambda per request.

I am guessing to beat this performance with dynamoDB, I would need to use Simple Queue Service to limit the number of concurrent Lambdas.

One of the biggest drawbacks I foresee is upgradeability. It will rapidly get messy if any changes are ever needed to the file formats. Any sort of queries beyond the core work-flow will require custom implementations.

user1055568
  • 131
  • 2