1

What is the best way to migrate a jupyter notebook in to Google Cloud Platform?

Requirements

  • I don't want to do a lot of changes to the notebook to get it to run
  • I want it to be scheduleable, preferably through the UI
  • I want it to be able to run a ipynb file, not a py file
  • In AWS it seems like sagemaker is the no brainer solution for this. I want the tool in GCP that gets as close to the specific task without a lot of extras

I've tried the following,

  • Cloud Function: it seems like it's best for running python scripts, not a notebook, requires you to run a main.py file by default

  • Dataproc: seems like you can add a notebook to a running instance but it cannot be scheduled

  • Dataflow: sort of seemed like overkill, like it wasn't the best tool and that it was better suited apache based tools

I feel like this question should be easier, I found this article on the subject:

How to Deploy and Schedule Jupyter Notebook on Google Cloud Platform

He actually doesn't do what the title says, he moves a lot of GCP code in to a main.py to create an instance and he has the instance execute the notebook.

Feel free to correct my perspective on any of this

user3738936
  • 936
  • 8
  • 22

1 Answers1

1

I use Vertex AI Workbench to run notebooks on GCP. It provides two variants:

  • Managed Notebooks
  • User-managed Notebooks

User-managed notebooks creates compute instances at the background and it comes with pre-built packages such as Jupyter Lab, Python, etc and allows customisation. I mainly use for developing Dataflow pipelines.

Other requirement of scheduling - Managed Notebooks supports this feature, refer this documentation (I am yet to try Managed Notebooks):

Use the executor to run a notebook file as a one-time execution or on a schedule. Choose the specific environment and hardware that you want your execution to run on. Your notebook's code will run on Vertex AI custom training, which can make it easier to do distributed training, optimize hyperparameters, or schedule continuous training jobs. See Run notebook files with the executor.

You can use parameters in your execution to make specific changes to each run. For example, you might specify a different dataset to use, change the learning rate on your model, or change the version of the model.

You can also set a notebook to run on a recurring schedule. Even while your instance is shut down, Vertex AI Workbench will run your notebook file and save the results for you to look at and share with others.

Rathish Kumar B
  • 1,271
  • 10
  • 21
  • It seems like the scheduling/executor only works for managed notebooks vvv. Can you schedule notebooks in dataflow ? https://stackoverflow.com/questions/74513285/vertex-ai-does-not-allow-you-to-schedule-via-executor?noredirect=1#comment131568976_74513285 – user3738936 Nov 28 '22 at 17:49