1

I managed to connect to Databricks from python using the following code snippet:

from databricks import sql

connection = sql.connect(
  server_hostname='<server-hostname>',
  http_path='<http-path>',
  access_token='<personal-access-token>')

cursor = connection.cursor()

cursor.execute('SELECT * FROM <database-name>.<table-name> LIMIT 2')

result = cursor.fetchall()

for row in result:
  print(row)

cursor.close()

This snippet is from the official documentation and as you can see, it requires server_hostname, http_path and access_token. My question here is, can I authenticate myself without the access_token? Maybe use a managed identity, since both technologies are from Microsoft?

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
anthino12
  • 770
  • 1
  • 6
  • 29

1 Answers1

3

You can use AAD token instead of personal access token. The simplest way to generate it is to use azure-identity library from Microsoft.

Here is an example of code to generate AAD token for service principal:

from databricks import sql
from azure.identity import ClientSecretCredential
import os

tenant_id = '...'
client_id = '...'
client_secret = os.environ['SP_SECRET']
csc = ClientSecretCredential(tenant_id, client_id, client_secret)

# important!
dbx_scope = '2ff814a6-3304-4ab8-85cb-cd0e6f879c1d/.default'
# this will be used as access token
aad_token = csc.get_token(dbx_scope).token

It should be also possible to use managed identity for that - just use ManagedIdentityCredential class instead. I believe that you can also generate a user's AAD token with that library. If you don't want to have an additional dependency, you can generate it via REST APIs, like this code.

The identity that will be used needs to be added into Azure Databricks workspace by administrator using the corresponding REST API or Databricks Terraform provider. And also given permissions to access Databricks cluster or SQL Endpoint - that's also doable via REST API or Terraform.

One thing that needs to be taken into account is that AAD token expires (how fast - it depends on what you're using: service principal or managed identity), so you may need to re-create connection periodically.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thank you for the answer Alex. Is `dbx_scope` the scope from Databricks? Where can I find it? – anthino12 Nov 25 '21 at 12:25
  • yes, it's the scope. It's documented here: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/aad/service-prin-aad-token#--get-an-azure-active-directory-access-token – Alex Ott Nov 25 '21 at 12:37