1

Recently I've been developing a python package install_databricks_packages which contacts the Databricks APIs (using requests, not the CLI) in order to install packages on Databricks Clusters. This package is used in release pipelines, where one can add a bash script which uses install_databricks_packages as a cli to install the needed packages on one or more clusters.

The problem is that during development I realized I need two different tokens in order to make install_databricks_packages work with packages hosted on our private Azure Artifacts repository, where we host some internally developed packages. The first token is a Databricks PAT which is needed to authorize the API call, the second one is a DevOps PAT needed when calling the /api/2.0/libraries/install API in order to install a package. Basically, I need to call the API like this

import requests

data = {"cluster_id": 123,
        "libraries": [
            {"pypi": {
                 "package": "private_package==1.0.0",
                 "repo": "https://<devops-token>@pkgs.dev.azure.com/<company>/pypi/simple/"
            }}
        ]}

requests.post(https://<host>/api/2.0/libraries/install, auth=('token', <databricks-token>))

I generated the two tokens with my user and saved them on Azure KeyVault as two different secrets, which can then be fetched in any release pipeline using the Azure KeyVault task.

I was wondering whether this is the only course of action. Having two PAT which are connected to a specific user and have expiration dates, thus have to be manually managed, is cumbersome. I couldn't find a better solution looking online, so any advice is welcome!

luigi
  • 159
  • 1
  • 10

0 Answers0