2

We are trying to figure out how to host and run many of our existing scikit-learn and R models (as is) in GCP. It seems ML Engine is pretty specific to Tensorflow. How can I train a scikit-learn model on Google cloud platform and manage my model if the dataset is too large to pull into datalab? Can I still use ML Engine or is there a different approach most people take?

As an update I was able to get the python script that trains the scikit-learn model to run by submitting it as a training job to ML Engine but haven't found a way to host the pickled model or use it for prediction.

dobbysock1002
  • 907
  • 10
  • 15
  • That depends on your use case. What is a large dataset? Can it fit into memory of the datalab instance if you make the instance larger? That's the simplest solution. If it doesn't, it might be the time to look at more suitable tools - like TF Learn, or perhaps Spark MLlib. – Robert Lacok Apr 25 '17 at 13:30

3 Answers3

1

Cloud ML Engine only supports models written in TensorFlow.

If you're using scikit-learn you might want to look at some of the higher level TensorFlow libraries like TF Learn or Keras. They might help migrate your model to TensorFlow in which case you could then use Cloud ML Engine.

Jeremy Lewi
  • 6,386
  • 6
  • 22
  • 37
1

It's possible, Cloud ML has this feature from Dec 2017, As of today it is provided as an early access. Basically Cloud ML team is testing this feature but you can also be part of it. More on here.

Use the following command to deploy your scikit-learn models to cloud ml. Please note these parameters may change in future.

gcloud ml-engine versions create ${MODEL_VERSION} --model=${MODEL} --origin="gs://${MODEL_PATH_IN_BUCKET}" --runtime-version="1.2" --framework="SCIKIT_LEARN"

Kishore Karunakaran
  • 598
  • 1
  • 6
  • 16
1

sklearn is now supported on ML Engine.

Here is a fully worked out example of using fully-managed scikit-learn training, online prediction and hyperparameter tuning:

https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/blogs/sklearn/babyweight_skl.ipynb

Lak
  • 3,876
  • 20
  • 34
  • does anyone know if the support for sci-kit learn can handle custom dependencies and if so how to implement this? – roman Jul 06 '18 at 15:16
  • Yes, just at the dependencies to setup.py – Lak Jul 07 '18 at 16:19
  • Thanks, can you point to any resources (blogs, examples, docs) on using custom dependencies for the ML engine + sci-kit learn? – roman Jul 20 '18 at 09:58
  • Here is an example of a setup.py that lists 3 custom dependencies: https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/blogs/pandas-pvtkey/setup.py 'pandas-gbq==0.4.1', 'urllib3', 'google-cloud-bigquery' – Lak Jul 22 '18 at 06:24