My use case is as follows:
I have a python script which:
1. reads a file from S3
2. processes the file and outputs a new file
3. saves the output file to S3 (or maybe a database)
The python script has some dependencies which are managed via virtualenv.
What is the recommended/easiest way of running these scripts in parallel on AWS?
I see the following options:
- AWS Batch: Looks really complicated - I have to build my own Docker container, set up 3 different users, it's not easy to debug.
- AWS Lambda: A bit easier to set up, but I still have to wrap my script up into a Lambda function. Debugging doesn't seem too straightforward
- Slurm on manually spun up EC2 instances - From a user perspective, this is ideal - all I would have to do is just create a jobs.sbatch file which loads the virtualenv and runs the script. The main drawback is that I have to install and configure slurm.
What is the recommended way of handing this workflow?