5

The Dataproc clusters I created always show status as "running" on web portal. Is there a way to stop/deprovision a cluster when it is not in use so that it does not burn resources and $$?

Igor Dvorzhak
  • 4,360
  • 3
  • 17
  • 31
sermolin
  • 161
  • 1
  • 2
  • 6
  • Yannick MG has the correct answer, but here are two more things you might be interested in: scheduled cluster deletion (https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/scheduled-deletion) and workflow templates (https://cloud.google.com/dataproc/docs/concepts/workflows/overview). Scheduled deletion lets you create rules like "delete my cluster after 2 hours" or "delete my cluster if it hasn't run jobs for 10 minutes". Workflows let you run a set of job(s) on a dedicated cluster that is torn down after the jobs are completed. – Karthik Palaniappan Feb 05 '18 at 20:30
  • 1
    when a cluster sits idle with no jobs submitted to it, does a customer gets billed for the use of resources (CPUs, disk, network, etc)? – sermolin Feb 07 '18 at 02:18
  • 2
    @sermolin Yes, you are billed for all resources that are reserved for your use, regardless of usage levels. – Yannick MG Feb 07 '18 at 14:02
  • An alternative way to pause/stop dataproc cluster: https://stackoverflow.com/questions/34558427/pausing-dataproc-cluster-google-compute-engine – Donny Nov 19 '18 at 19:52
  • 1
    Does this answer your question? [Pausing Dataproc cluster - Google Compute engine](https://stackoverflow.com/questions/34558427/pausing-dataproc-cluster-google-compute-engine) – Igor Dvorzhak Dec 15 '20 at 23:14

4 Answers4

5

We recently launched the ability to stop and start clusters.

https://cloud.google.com/dataproc/docs/guides/dataproc-start-stop

Mikayla Konst
  • 51
  • 2
  • 2
1

As described in the Dataprod documentation you can delete a running Dataproc cluster either by choosing the "Delete" option from the Dataproc dashboard, by running the Cloud SDK command gcloud dataproc clusters delete cluster-name or by calling the clusters.delete REST method.

Yannick MG
  • 786
  • 9
  • 19
  • Well, that's regretful. Azure does have a STOP option and it is very handy. Consider this use-case: I am debugging an issue. I deployed a cluster. Maybe made some mods on the headnode (installed a spark package I needed, changed etc/environment...). I got interrupted. Or my counterpart in Shanghai or Bangalore is unavailable and I need to wait for him/her to come on-line. What do I do? Delete the cluster and all the work I have done and recreate again the next time? Azure "STOP" button allows me to release CPUs, but keeps disks, net configs, etc. So I am being charged only for these. Smart. – sermolin Mar 15 '18 at 01:33
  • 1
    Well one thing you can do is [create a feature request](https://issuetracker.google.com/issues/new?component=187133) on the Dataproc issue tracker and encourage others who desire the feature to star it to show their interest, so that Google knows you want this. Be sure to detail how having to delete clusters instead of stopping them is impacting you and how you'd like to see that feature implemented. – Yannick MG Mar 15 '18 at 12:39
  • @sermolin I think Compute Engine VM instances is what you need. You can start/stop Compute Engine VM instances. Create custom instance, install what ever you want on it and then start only when needed. – gSorry Jun 25 '19 at 13:51
1

You can achieve the AZURE "STOP" functionality described by you in GCP as well. However, its just not click of the button, whereas you need to go to Compute engine and stop all the Vm's associated with your cluster. You will be only charged for the disk space used by the cluster

1

Go to compute Engine -> vm instancess and stop each node of the cluster