I am very new to Data Architecture and here I want to build an end-to-end architecture:
- Source: Snowflake Tables
- Target: Snowflake Tables
In between we have to do some processing, here is the flow:
- We are exporting data from snowflake tables (specific columns using joins) to AWS S3.
- Now these files are being used by AWS SageMaker (Code is already written to process these files in python) but need to build the pipelines for the same.
- Once the processing is done again processed data is sent in AWS S3 (another bucket).
- Now again we need to load these files into snowflake tables.
Requirement:
I need to link all these tools and create a workflow.
Firstly, how to create a pipeline in Sagemaker for a defined code?
My approach:
- We can create a snowflake to load data from snowflake tables to AWS S3.
- Assuming we have created a pipeline, then create an AWS lambda function to trigger AWS Sagemaker pipeline once the file is available in AWS S3.
- Now again processed data is available in AWS S3 trigger an Airflow DAG to load data into Snowflake tables.
Here I am not able to figure out the linkage between tools in a one workflow.