I am building a data lake pipeline on aws which includes many AWS services like s3, cloudwatch, lambda, glue crawler, glue job etc. The pipeline flow works like:
- cloudwatch schedule a cron job to trigger a lambda to fetch external data and save them in s3 bucket.
- a lambda will be triggered whenever a file is uploaded to the s3 bucket who trigger a glue crawler
- cloudwatch listen on glue crawler state change and trigger a lambda which calls a glue job to do data ETL
It works fine but I feel it is hard to monitor the the whole process. The only thing I can get is the log saved in cloudwatch and some notification / alert. Is there a better way to monitor this pipeline? Like viewing it as in a workflow diagram to see each time of execution.