My objective is to get the data from S3 files, transform and save it to a datasource(could be dynamoDB or RDS). And the filesize would be <20MB and there could be multiple(~10) such files uploaded periodically (once a day). I'm considering using below approaches.
- AWS lambda
- AWS batch.
Ideally, file processing should take less than 15 mins, but there is no guarantee on the file size. So in theory file processing could be beyond lambda's processing capabilities. So the approach I thought of is to check beforehand if the file processing can be done via lambda. If yes, invoke the lambda. Else Trigger Batch job. As of now, using dynamoDB is what I'm considering, but there is no guarantee that the item size < 400KB, but in practice the item size would be <400KB. Would my proposed design be any different if I switch the db to RDS?
Another question I have is when to consider traditional ETL approaches like using a AWS data pipeline or EMR or Glue.