1

Is it possible to switch workspace with the use of databricks-connect?

I'm currently trying to switch with: spark.conf.set('spark.driver.host', cluster_config['host'])

But this gives back the following error: AnalysisException: Cannot modify the value of a Spark config: spark.driver.host

PvG
  • 86
  • 10

3 Answers3

1

If you look into documentation on setting the client, then you will see that there are three methods to configure Databricks Connect:

  • Configuration file generated with databricks-connect configure - the file name is always ~/.databricks-connect,
  • Environment variables - DATABRICKS_ADDRESS, DATABRICKS_API_TOKEN, ...
  • Spark Configuration properties - spark.databricks.service.address, spark.databricks.service.token, ... But when using this method, Spark Session could be already initialized, so you may not able switch on the fly, without restarting Spark.

But if you use different DBR versions, then it's not enough to change configuration properties, you also need to switch Python environments that contains corresponding version of Databricks Connect distribution.

For my own work I wrote following Zsh script that allows easy switch between different setups (shards) - it allows to use only one shard at time although. Prerequisites are:

  • Python environment is created with name <name>-shard
  • databricks-connect is installed into activated conda environment with:
pyenv activate field-eng-shard
pip install -U databricks-connect==<DBR-version>
  • databricks-connect is configured once, and configuration for specific cluster/shard is stored in the ~/.databricks-connect-<name> file that will be symlinked to ~/.databricks-connect
function use-shard() {
    SHARD_NAME="$1"
    if [ -z "$SHARD_NAME" ]; then
        echo "Usage: use-shard shard-name"
        return 1
    fi
    if [ ! -L ~/.databricks-connect ] && [ -f ~/.databricks-connect ]; then
        echo "There is ~/.databricks-connect file - possibly you configured another shard"
    elif [ -f ~/.databricks-connect-${SHARD_NAME} ]; then
        rm -f ~/.databricks-connect
        ln -s ~/.databricks-connect-${SHARD_NAME} ~/.databricks-connect
        pyenv deactivate
        pyenv activate ${SHARD_NAME}-shard
    else
        echo "There is no configuration file for shard: ~/.databricks-connect-${SHARD_NAME}"
    fi
}
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
1

I created a simple python script to change the cluster_id within the .databricks-connect configuration file.

To execute, ensure your virtual env has environment variable DATABRICKS_CLUSTER configured. Obtaining the cluster ID is shown here in the official databricks-connect documentation.

Set the environment variable with:

export DATABRICKS_CLUSTER=your-cluster-id

Once the environment variable is set, simply use the following python script to switch cluster whenever your new virtual environment is activated.

import os
import json

#Get databricks cluster associated with current virtual env
DATABRICKS_CLUSTER = os.getenv('DATABRICKS_CLUSTER')
HOME = os.getenv('HOME')

#Open the databricks-connect config file
with open(f'{HOME}/.databricks-connect', 'r') as j:
    config = json.loads(j.read())

#Update new cluster ID
config['cluster_id'] = DATABRICKS_CLUSTER

#Save the databricks connect config file
with open(f'{HOME}/.databricks-connect', 'w') as outfile:
    json.dump(config, outfile, indent=4)
0

Probably it doesn't answer your question directly, but it's also possible to use Visual Studio Databricks plugin that will use databricks connect and from there is very easy to switch to different envs. https://marketplace.visualstudio.com/items?itemName=paiqo.databricks-vscode.

        "databricks.connectionManager": "VSCode Settings",
        "databricks.connections": [
            {
                "apiRootUrl": "https://westeurope.azuredatabricks.net",
                "displayName": "My DEV workspace",
                "localSyncFolder": "c:\\Databricks\\dev",
                "personalAccessToken": "dapi219e30212312311c6721a66ce879e"
            },
            {
                "apiRootUrl": "https://westeurope.azuredatabricks.net",
                "displayName": "My TEST workspace",
                "localSyncFolder": "c:\\Databricks\\test",
                "personalAccessToken": "dapi219e30212312311c672aaaaaaaaaa"
            }
        ],
        ...
chomar.c
  • 61
  • 5