looking for a better way to visualize data lake pipeline on AWS

Question

I am building a data lake pipeline on aws which includes many AWS services like s3, cloudwatch, lambda, glue crawler, glue job etc. The pipeline flow works like:

- cloudwatch schedule a cron job to trigger a lambda to fetch external data and save them in s3 bucket. 
- a lambda will be triggered whenever a file is uploaded to the s3 bucket who trigger a glue crawler
- cloudwatch listen on glue crawler state change and trigger a lambda which calls a glue job to do data ETL

It works fine but I feel it is hard to monitor the the whole process. The only thing I can get is the log saved in cloudwatch and some notification / alert. Is there a better way to monitor this pipeline? Like viewing it as in a workflow diagram to see each time of execution.

score 0 · Answer 1 · answered Aug 08 '19 at 01:44

0

You can try AWS X-Ray. AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture. It traces user requests as they travel through your entire application. It aggregates the data generated by the individual services and resources that make up your application, providing you an end-to-end view of how your application is performing. Check here for more details here .

answered Aug 08 '19 at 01:44

Md khirul ashik

153
10

X-Ray can be used for lambda, load-balance, ec2, sns etc but not glue – Joey Yi Zhao Aug 08 '19 at 03:24

looking for a better way to visualize data lake pipeline on AWS

1 Answers1