2

I have a ADF pipeline with a Databricks activity.

The activity creates a new job cluster every time and I have added all the required Spark configurations to a corresponding linked service.

Now with Databricks offering Spot Instances, I'd like to create my new clusters with Spot configurations within Databricks.

I tried to find the help from the LinkedService docs but no luck!

How can I do this using ADF?

Cheers!!!

mani_nz
  • 4,522
  • 3
  • 28
  • 37

3 Answers3

5

I have found another workaround to enable the ADF Databricks Linked Service to create job clusters with spot instances. As Alex Ott mentioned, the azure_attribute cluster property isn't supported by the Databricks Linked Service interface.

Instead, I ended up creating a cluster policy that enforces spot instances:

{
  "azure_attributes.availability": {
    "type": "fixed",
    "value": "SPOT_WITH_FALLBACK_AZURE",
    "hidden": true
  }
}

You can add to that policy if you want to augment the other properties of the azure_attributes object. Also, make sure you set the policy permissions for the appropriate groups/users.

After creating the policy you will need to retrieve the policy id. I used a REST call to the 2.0/policies/clusters/list endpoint to get that value.

From there you can do what Alex Ott suggested and create the linked service using the dynamic json option and add the policyId property with the appropriate policy id to the typeProperties object:

"typeProperties": {
  "domain": "Your Domain",
  "newClusterNodeType": "@linkedService().ClusterNodeType",
  "newClusterNumOfWorker": "@linkedService().NumWorkers",
  "newClusterVersion": "7.3.x-scala2.12",
  "newClusterInitScripts": [],
  "newClusterDriverNodeType": "@linkedService().DriverNodeType",
  "policyId": "Your policy id",
}

Now when you invoke your ADF pipeline it will create a job cluster using the cluster policy to restrict the availability property of azure_attributes to whatever you specified.

Devin Dewitt
  • 66
  • 1
  • 5
1

I'm not sure that it's possible right now as it requires specification of the azure_attributes parameters when creating the cluster. But there should be a workaround - create an instance pool of the spot instances and specify that pool via instancePoolId property.

Update: it really works, the only drawback is that you need to use JSON to configure Linked Service (but it's possible to configure everything visually, save, and grab JSON from Git repository and update it with required parameters). So basic steps are following:

  • Configure instance pool to use spot instances:

enter image description here

  • Configure Databricks linked service to use the instance pool:
{
    "name": "DBName",
    "type": "Microsoft.DataFactory/factories/linkedservices",
    "properties": {
    "annotations": [],
    "type": "AzureDatabricks",
    "typeProperties": {
        "domain": "https://some-url.azuredatabricks.net",
        "newClusterNodeType": "Standard_DS3_v2",
        "newClusterNumOfWorker": "5",
        "instancePoolId":"<your-pool-id>",
        "newClusterSparkEnvVars": {
        "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
        },
        "newClusterVersion": "8.2.x-scala2.12",
        "newClusterInitScripts": [],
        "encryptedCredential": "some-base-64"
    }
    }
}
  • Configure an ADF pipeline with job to execute - just as usual

  • Trigger ADF pipeline and after several minutes see that instance pool is used:

enter image description here

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Wow.. Sounds promising.. I'll try this thanks @Alex ott – mani_nz May 06 '21 at 23:10
  • @mani_nz it really works - see updated answer – Alex Ott May 11 '21 at 12:32
  • would this create driver nodes as well in the pool? If yes then I have the risk of getting my driver nodes evicted and job crashing right? – mani_nz May 27 '21 at 02:15
  • @mani_nz yes, driver node will be on spot as well. But in latest platform release there is a feature called "Hybrid pools" that allows driver to be in another pool that could be created from OnDemand nodes – Alex Ott Jul 09 '21 at 06:21
  • Hello @AlexOtt, where can i get these values "name": "DBName(?)", "encryptedCredential": "some-base-64(?)", thanks in advance. – Anil Kumar Jan 28 '22 at 08:47
  • refer to accepted answer - you need to use dynamic attributes. `DBName` is just name of the linked service – Alex Ott Jan 28 '22 at 09:14
0

Please use ADF linked service option shown below to create a Spot Instance

enter image description here

Sandy
  • 223
  • 1
  • 2
  • 8
  • 1
    Author is asking about using Azure Spot VMs: https://learn.microsoft.com/en-us/azure/databricks/release-notes/product/2021/march#save-your-azure-databricks-tco-with-azure-spot-vms-public-preview – Alex Ott May 06 '21 at 12:27