0

Is it possible to compile and build custom Apache Spark on Google Cloud Dataproc? Lets say we want to tweak Apace Spark and then want to build custom Spark on dataproc.

1 Answers1

1

This should be possible. Dataproc spins up normal compute engine VMs for you and sets up Hadoop and YARN.

You can login to any machine, install your custom spark build and specify the YARN and Hadoop installation (of Dataproc). However, I doubt you will be able to use the GCP console or the dataproc command line interface to submit and monitor jobs with your own installation without further modification.

O. Gindele
  • 376
  • 3
  • 6
  • You probably want to use an init action to make sure all nodes are using your version of spark and not trying to load the pre-installed spark. Init actions are basically arbitrary scripts that run on all nodes. https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/init-actions – Karthik Palaniappan Feb 27 '18 at 23:24