2

We use Spark 2.2 on Azure HDInsight for ad hoc exploration and batch jobs.

The jobs should run ok on a 5x medium VM cluster. They are 1. notebooks (Zeppelin with Livy.spark2 magics) 2. compiled jars being run with Livy.

I have to remember to scale this cluster down to 1 worker when not using it, to save money. (0 workers would be nice, if that were possible).

I'd like Spark to manage this for me... When a Job starts, scale the cluster up to a minimum size first, then pause ~10 mins while that completes. After an idle period without Jobs, scale down again.

Iain
  • 1,797
  • 1
  • 20
  • 38

2 Answers2

1

You can use PowerShell or Azure classic CLI to scale up/down the cluster. But you might need to write a script to track the cluster resource usage and scale down automatically.

Here is a powershell syntax

Set-AzureRmHDInsightClusterSize -ClusterName <Cluster Name> -TargetInstanceCount <NewSize>

Here is a PowerShell workflow runbook that will help you automate the process of scaling in or out your HDInsight clusters depending on your needs

https://gallery.technet.microsoft.com/scriptcenter/Scale-your-HDInsight-f57bb4d8

or

You can use the below option to scale it manually (even though your question is how to scale up/down automatically, I thought it would be useful to someone who wants to scale up/down manually)

scaling manually

Below is the link for an article explaining different methods to scale the cluster using PowerShell or Classic CLI (remember: the latest CLI does n't support scaling feature)

https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-scaling-best-practices

If you want Spark to handle it dynamically, Azure Databricks is the best choice (but it is only Spark cluster, no Hadoop components (except Hive)). As HDInsight - Spark is not a Azure managed service, and will not solve your use case.

Below is the image of a new cluster (in Azure Data bricks) - I highlighted an "enable auto scaling option" which will allow you to scale dynamically when job is executed.

enter image description here

Anandkumar
  • 1,338
  • 13
  • 15
0

I'm told that Azure Databricks may be a better solution for this use case.

Iain
  • 1,797
  • 1
  • 20
  • 38