Is it possible to run code located in Google Cloud Datalab on Dataproc clusters?
The idea is to use the great interactivity and interface by Datalab to run Apache Spark code.
Is it possible to run code located in Google Cloud Datalab on Dataproc clusters?
The idea is to use the great interactivity and interface by Datalab to run Apache Spark code.
This is on our radar but not yet fully enabled as an init action for a Dataproc cluster.
Thanks. Dinesh Kulkarni Product Manager, Datalab & Machine Learning, GCP
Now it is possible, just create a dataproc cluster using this command:
gcloud dataproc clusters create $CLUSTERNAME \
--project $PROJECT \
--num-workers $WORKERS \
--bucket $BUCKET \
--metadata startup-script-url=gs://$BUCKET/setup/setup_env.sh,BUCKET=$BUCKET \
--master-machine-type $VMMASTER \
--worker-machine-type $VMWORKER \
--initialization-actions \
gs://dataproc-initialization-actions/datalab/datalab.sh \
--scopes cloud-platform
To make it even easier you can use this script: https://github.com/kanjih-ciandt/script-dataproc-datalab/tree/master