I'm a beginner in AWS, so please bare with me, if certain things are a bit off :)
I have a task, where I need to load in a fixed width text file, that contains both a header record and a footer record. And of cause a lot of data in between. The data needs some simple changes, before written into the destination file, which also should be a fixed width file.
Would like to utilize AWS Glue for this, but am a little in doubt how to attack this. I guess since the data has header and footer, spark would be my best option to both read and write the file?
The Glue job should be triggered by the input file being uploaded into a S3 bucket.
What would be the flow here?
- Uploading file to S3
- S3 notification event triggering what? Lambda?
- Lambda starting up Glue job with spark script: a) Load txt file data into table b) reading and transforming data c) Writing txt file in S3
Do I need a crawler in between somewhere?
Thanks in advance.