4

Is it possible to run code located in Google Cloud Datalab on Dataproc clusters?

The idea is to use the great interactivity and interface by Datalab to run Apache Spark code.

jaycode
  • 2,926
  • 5
  • 35
  • 71

2 Answers2

3

This is on our radar but not yet fully enabled as an init action for a Dataproc cluster.

Thanks. Dinesh Kulkarni Product Manager, Datalab & Machine Learning, GCP

Dinesh
  • 421
  • 2
  • 7
1

Now it is possible, just create a dataproc cluster using this command:

gcloud dataproc clusters create $CLUSTERNAME \
    --project $PROJECT \
    --num-workers $WORKERS \
    --bucket $BUCKET \
    --metadata startup-script-url=gs://$BUCKET/setup/setup_env.sh,BUCKET=$BUCKET \
    --master-machine-type $VMMASTER \
    --worker-machine-type $VMWORKER \
    --initialization-actions \
         gs://dataproc-initialization-actions/datalab/datalab.sh \
    --scopes cloud-platform

To make it even easier you can use this script: https://github.com/kanjih-ciandt/script-dataproc-datalab/tree/master

hkanjih
  • 1,271
  • 1
  • 11
  • 29