ML model serving with great developer ergonomics

Question

We are looking for ML model serving with a developer experience where the ML engineers don’t need to know Devops.

Ideally we are looking for the following ergonomics or something similar:

Initialize a new model serving end point preferably by a CLI, get a GCS bucket
each time we train a new model, we put it in the GCS bucket of step 1.

The serving system guarantees that the most recent model in the bucket is served unless a model is specified by version number.

We are also looking for a service that optimizes cost and latency.

Any suggestions?

score 1 · Answer 1 · answered May 22 '22 at 02:03

1

Have you considered https://www.tensorflow.org/tfx/serving/architecture? You can definitely automate the entire workflow using tfx. I think the guide here does a good job walking through it. Depending on your use-case, you may want to use tft instead of Kubeflow like they're doing in that guide. Besides serving automation, you may also want to consider pipeline automation to separate the feature engineering from the pipeline mechanics itself. For example, you can build the pipeline, abstract out the feature engineering into a tensorflow function meeting certain requirements, and automate the deployment process also. This way you don't need to deal with the feature specs/schemas manually, and you know that your transformations are the same during serving as they were while training.

You can do the same thing with scikit-learn also, and I believe serving scikit-learn models is also supported under the vertex-ai umbrella.

To your point about latency, you definitely want the pipeline doing the transformations on the gpu, as such, I would recommend using tensorflow over something like scikit-learn if the use-case is truly time sensitive.

Best of luck!

answered May 22 '22 at 02:03

Pritam Dodeja

177
1
8

Thanks @pritamdodeja Could you elaborate what you meant by tft vs Kubeflow? – Humble Debugger May 22 '22 at 16:22
1

The term 'pipeline' is very overloaded. You can think of a pipeline in the sense it is the set of steps that transforms the data (tft view). You can look at a pipeline in the sense of the entire provisioning of all the dependencies, including the orchestration of containers etc., this is a bigger definition of pipeline, which is how Kubeflow and TFX are looking at it. Reproducibility risk comes not just from the transformations (higher prob.) but data (next highest imo), underlying runtimes etc. Kubeflow and TFX solve the risks more comprehensively, but I think tft is the first step. – Pritam Dodeja May 22 '22 at 20:09
1

@HumbleDebugger I just created a repo that might be useful to you in terms of a cleaner separation of concerns. You can find it at https://github.com/pritamdodeja/tft_tasks – Pritam Dodeja May 23 '22 at 00:50
1

Here's something else that speaks to the point we were discussing more comprehensively: https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning – Pritam Dodeja May 23 '22 at 14:52

ML model serving with great developer ergonomics

1 Answers1