I need to install a JAR file as a library while setting up a Databricks cluster as part of my Azure Release pipeline. As of now, I have completed the following -
- use an Azure CLI task to create the cluster definition
- use curl command to download the JAR file from Maven repository into the pipeline agent folder
- set up Databricks CLI on the pipeline agent
- use
databricks fs cp
to copy the JAR file from local(pipeline agent) directory onto dbfs:/FileStore/jars folder
I am trying to create a cluster-scoped init script(bash) script that will -
- install pandas, azure-cosmos and python-magic packages
- install the JAR file (already copied in the earlier steps to dbfs:/FileStore/jars location) as a cluster library file
My cluster init script looks like this -
#!/bin/bash
/databricks/python/bin/pip install pandas 2>/dev/null
/databricks/python/bin/pip install azure-cosmos 2>/dev/null
/databricks/python/bin/pip install python-magic 2>/dev/null
But I don't know -
- if this would add the packages to the cluster
- how to add an existing JAR file to a cluster as a library
I know there are other ways to edit cluster library metadata, but as far as my knowledge, any change on the cluster libraries would require the cluster to be in RUNNING state which may not be in our case. That's why, I want to add an init script to my cluster definition so that, as and when the cluster is RESTARTED/RUNNING, the init script will be executed.
Please help.
Thanks. Subhash