0

We have a use-case to build data pipeline solution in which we need following things:

  1. Ability to have multiple steps (outputs from one step should feed as input to next)
  2. Ability to have multiple algorithms (SQL Query or probably invoke REST endpoint) in each step.

Input to first step can be anything. We have DW tables, but we can pre-process and keep the relevant information in AWS S3 or other data store.

Something like this: Data Pipeline

Is there an existing solution that already provides functionalities similar to this or can be modified to support this?

Having something in AWS would be easier to integrate.

  • look like your requirements fit airflow? Have you checked MWAA - amazon managed version of airflow. https://docs.aws.amazon.com/mwaa/latest/userguide/what-is-mwaa.html – mjeday Jan 27 '22 at 15:25

1 Answers1

0

How about AWS Glue? Sounds like a fit to your goals...

Rownum Highart
  • 187
  • 1
  • 13
  • Can you explain how? – Prakhar Awasthi Jan 27 '22 at 15:52
  • An exerpt from documentation link: "AWS Glue provides both visual and code-based interfaces to make data integration easier. Users can easily find and access data using the AWS Glue Data Catalog. Data engineers and ETL (extract, transform, and load) developers can visually create, run, and monitor ETL workflows with a few clicks in AWS Glue Studio. Data analysts and data scientists can use AWS Glue DataBrew to visually enrich, clean, and normalize data without writing code." The full documentation link can be found on https://aws.amazon.com/glue/ – Rownum Highart Jan 27 '22 at 16:04