Kubernetes Machine Learning Model Serving

Question

Is there a suggested way to serve hundreds of machine learning models in Kubernetes? Solutions like Kfserving seem to be more suitable for cases where there is a single trained model, or a few versions of it, and this model serves all requests. For instance a typeahead model that is universal across all users.

But is there a suggested way to serve hundreds or thousands of such models? For example, a typeahead model trained specifically on each user's data.

The most naive way to achieve something like that, would be that each typeahead serving container maintains a local cache of models in memory. But then scaling to multiple pods would be a problem because each cache is local to the pod. So each request would need to get routed to the correct pod that has loaded the model.

Also having to maintain such a registry where we know which pod has loaded which model and perform updates on model eviction seems like a lot of work.

score 1 · Answer 1 · answered Feb 24 '20 at 09:42

You can use Catwalk mixed with Grab.

Grab has a tremendous amount of data that we can leverage to solve complex problems such as fraudulent user activity, and to provide our customers personalized experiences on our products. One of the tools we are using to make sense of this data is machine learning (ML).

That is how Catwalk is created: an easy-to-use, self-serve, machine learning model serving platform for everyone at Grab.

More infromation about Catwalk you can find here: Catwalk.

You can serve multiple Machine Learning models using TensorFlow and Google Cloud.

The reason the field of machine learning is experiencing such an epic boom is because of its real potential to revolutionize industries and change lives for the better. Once machine learning models have been trained, the next step is to deploy these models into usage, making them accessible to those who need them — be they hospitals, self-driving car manufacturers, high-tech farms, banks, airlines, or everyday smartphone users. In production, the stakes are high and one cannot afford to have a server crash, connection slow down, etc. As our customers increase their demand for our machine learning services, we want to seamlessly meet that demand, be it at 3AM or 3PM. Similarly, if there is a decrease in demand we want to scale down the committed resources so as to save cost, because as we all know, cloud resources are very expensive.

More information you cna find here: machine-learning-serving.

Also you can use Seldon.

Seldon Core is an open source platform for deploying machine learning models on a Kubernetes cluster.

Features:

deploying machine learning models in the cloud or on-premise.
gaining metrics ensuring proper governance and compliance for your running machine learning models.
creating inference graphs made up of multiple components.
providing a consistent serving layer for models built using
heterogeneous ML toolkits.

Useful documentation: Kubernetes-Machine-Learning.

Kubernetes Machine Learning Model Serving

1 Answers1