AWS SageMaker ML DevOps tooling / architecture - Kubeflow?

Question

I'm tasked with defining AWS tools for ML development at a medium-sized company. Assume about a dozen ML engineers plus other DevOps staff familiar with serverless ( lambdas and the framework ). The main questions are: a) what is an architecture that allows for the main tasks related to ML development (creating, training, fitting models, data pre-processing, hyper parameter optimization, job management, wrapping serverless services, gathering model metrics, etc ), b) what are the main tools that can be used for packaging and deploying things and c) what are the development tools (IDEs, SDKs, 'frameworks' ) used for it? I just want to set Jupyter notebooks aside for a second. Jupyter notebooks are great for proof-of-concepts and the closest thing to PowerPoint for management... But I have a problem with notebooks when thinking about deployable units of code.
My intuition points to a preliminary target architecture with 5 parts:

1 - A 'core' with ML models supporting basic model operations (create blank, create pre-trained, train, test/fit, etc). I foresee core Python scripts here - no problem.

2- (optional) A 'containerized-set-of-things' that performs hyper parameter optimization and/or model versioning

3- A 'contained-unit-of-Python-scripts-around-models' that exposes an API and that does job management and incorporates data pre-processing. This also reads and writes to S3 buckets.

4- A 'serverless layer' with high level API ( in Python ). It talks to #3 and/or #1 above.

5- Some container or bundling thing that will unpack files from Git and deploy them onto various AWS services creating things from the previous 3 points.

As you can see, my terms are rather fuzzy:) If someone can be specific with terms that will be helpful. My intuition and my preliminary readings say that the answer will likely include a local IDE like PyCharm or Anaconda or a cloud-based IDE (what can these be? - don't mention notebooks please). The point that I'm not really clear about is #5. Candidates include Amazon SageMaker Components for Kubeflow Pipelines and/or Amazon SageMaker Components for Kubeflow Pipelines and/or AWS Step Functions DS SDK For SageMaker. It's unclear to me how they can perform #5, however. Kubeflow looks very interesting but does it have enough adoption or will it die in 2 years? Are Amazon SageMaker Components for Kubeflow Pipelines, Amazon SageMaker Components for Kubeflow Pipelines and AWS Step Functions DS SDK For SageMaker mutually exclusive? How can each of them help with 'containerizing things' and with basic provisioning and deployment tasks?

Bilal Ali Jafri · Answer 1 · 2020-12-08T20:56:19.137

Its a long question although and these things totally make sense when you think to design ML infrastructure for production. So there are three levels that defines the maturity of your machine learning process.

1- CI/CDeployment: in this docker image will go through stages like build, test and push the versioned training image to the registry. You can also perform training in these and can store versioned model using git references.

2- Continuous Training: Here we deal with the ML Pipeline. Automation of the process using new data to retrain models. It becomes very useful when you have to run whole ML pipeline with new data or new implementation.

Tools for implementation: Kubeflow pipelines, Sagemaker, Nuclio

3- Continuous delivery: Where?? On cloud or on Edge? On cloud then you can use KF serving or use sage maker with kubeflow pipelines and deploy the model with sagemaker through Kubeflow.

Sagemaker and Kubeflow somehow give same functionality but each of them have their unique power. Kubeflow has power of kubernetes, pipelines, portability, cache and artifacts meanwhile Sagemaker have power of Manged infrastructure and scale from 0 capability and AWS ML services like Athena or Groundtruth.

Solution:

Kubeflow pipelines standalone + AWS Sagemaker(Training+Serving Model) + Lambda to trigger pipelines from S3 or Kinesis.

Infra required.

-Kubernettess cluster (Atleast 1 m5)
-MinIo or S3 
-Container registry
-Sagemaker credentials
-MySQL or RDS  
-Loadbalancer
-Ingress for using kubeflow SDK

Again you asked me my year journey in one question. If you are intrested lets connect :)

Permissions:

Kube --> registry (Read)
Kube --> S3 (Read, Write)
Kube --> RDS (Read, Write)
Lambda --> S3 (Read)
Lambda --> Kube (API Access) 
Sagemaker --> S3, Registery

A good starting guide

https://github.com/kubeflow/pipelines/tree/master/manifests/kustomize/env/aws

https://aws.amazon.com/blogs/machine-learning/introducing-amazon-sagemaker-components-for-kubeflow-pipelines/

https://github.com/shashankprasanna/kubeflow-pipelines-sagemaker-examples/blob/master/kfp-sagemaker-custom-container.ipynb

AWS SageMaker ML DevOps tooling / architecture - Kubeflow?

1 Answers1