2

Trying to automate the setup of Databricks SQL.

I have done it from the UI and it works, so this is a natural next step.

The one thing I am unsure about is how to automate granting of the access to SQL tables and/or views using REST. I am trying to avoid a Notebooks job.

I have seen this microsoft documentation and downloaded the specification but when I opened it with Postman, I see permissions/objectType/Object id, but the only sample I have seen there is for "queries". It just seems to be applicable for Queries and Dashboards. Can't this be done for Tables and views? There is no further documentation that I could see.

So, basically how to do something like grant select on tablename to group using REST api without using a Notebook job. I am interested to see if I can just call a REST endpoint from our release pipeline (Azure DevOps)

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Saugat Mukherjee
  • 778
  • 8
  • 32

2 Answers2

1

As of right now, there is no REST API for setting Table ACLs. But it's available as part of the Unity Catalog that is right now in the public preview.

If you can't use Unity Catalog yet, then you still have a possibility to automate assignment of Table ACLs by using databricks_sql_permissions resource of Databricks Terraform Provider - it sets permissions by executing SQL commands on a cluster, but this is hidden from administrator.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • would you recommend doing it via terraform or simply have a sql script in db sql and it being called/executed by some client? We have hundreds of grant statements and they are growing. I wonder if there might be performance issues. I also wonder if it's a problem to always run the whole thing or should I make an effort to only run the changes (going to be tricky to find out). – Triamus Sep 27 '22 at 13:59
  • but from the [docs](https://registry.terraform.io/providers/databricks/databricks/latest/docs/resources/sql_permissions#privilege_assignments-blocks) it seems that tf is checking the desired state and removing anything that does not correspond which would be big advantage over manual grant statement that would require "manual" and "costly" removal of all permissions before a run. – Triamus Sep 27 '22 at 14:11
  • 1
    Yes, the main advantage of Terraform that it will revoke what isn't declared in the code – Alex Ott Sep 27 '22 at 15:19
  • Sorry one more clarification. The docs always talk about table acl and clusters enabled with table acl i.e. classic workspace. I'm only interested in managing sql object access in databricks sql as the OP was asking. How would I translate that to DB SQL? We do not want to turn on table ACL in classic workspace. It's a different group of users. Or am I missing sth obvious? Thanks. – Triamus Oct 06 '22 at 16:29
  • Or should we enable table access control in classic workspace only for it to be used in databricks sql, i.e. not using table acl clusters in classic workspace making it ineffective there but still it is translated to DB SQL? That feels a bit hacky. – Triamus Oct 06 '22 at 16:44
0

This is an extension to Alex Ott `s answer giving some details on what I tried to make the databricks_sql_permissions Resource work for Databricks SQL as was the OP's original question. All this assumes that one does not want/can use Unity Catalog which follows a different permission model and has a different Terraform resource, namely databricks_grants Resource.

Alex`s answer refers to table ACLs which had me surprised as the OP (and myself) were looking for Databricks SQL object security and not table ACLs in the classic workspace. But from what I understand so far, it seems the two are closely interlinked and the Terraform provider addresses table ACLs in the classic workspace (i.e. non-SQL) which are mirrored to SQL objects in the SQL workspace. It follows that if you like to steer SQL permissions in Databricks SQL via Terraform, you need to enable table ACLs in classic workspace (in admin console). If you (for whatever reason) cannot enable table ACLs, it seems to me the only other option is via sql scripts in the SQL workspace with the disadvantage of having to explicitly write out grants and revokes. Potentially an alternative is to throw away all permissions before one only runs grant statements but this has other negative implications.

So here is my approach:

  1. Enable table ACL in classic workspace (this has no implications in classic workspace if you don`t use table ACL-enabled clusters afaik)
  2. Use azurerm_databricks_workspace resource to register Databricks Azure infrastructure
  3. Use databricks_sql_permissions Resource to manage table ACLs and thus SQL object security

Below is a minimal example that worked for me and may inspire others. It certainly does not follow Terraform config guidance but is merely used for minimal illustration.

NOTE: Due to a Terraform issue I had to ignore changes from attribute public_network_access_enabled, see GitHub issues: "azurerm_databricks_workspace" forces replacement on public_network_access_enabled while it never existed #15222

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.0.0"
    }
    databricks = {
      source = "databricks/databricks"
      version = "=1.4.0"
    }
  }
    backend "azurerm" {
        resource_group_name  = "tfstate"
        storage_account_name = "tfsa"
        container_name       = "tfstate"
        key                  = "terraform.tfstate"
    }

}

provider "azurerm" {
  features {}
}

provider "databricks" {
  azure_workspace_resource_id = "/subscriptions/mysubscriptionid/resourceGroups/myresourcegroup/providers/Microsoft.Databricks/workspaces/mydatabricksworkspace"
}

resource "azurerm_databricks_workspace" "adbtf" {
  customer_managed_key_enabled = false
  infrastructure_encryption_enabled = false
  load_balancer_backend_address_pool_id = null
  location = "westeurope"
  managed_resource_group_name = "databricks-rg-myresourcegroup-abcdefg12345"
  managed_services_cmk_key_vault_key_id = null
  name = "mydatabricksworkspace"
  network_security_group_rules_required = null
  public_network_access_enabled = null
  resource_group_name = "myresourcegroup"
  sku = "premium"
  custom_parameters {
      machine_learning_workspace_id = null
      nat_gateway_name = "nat-gateway"
      no_public_ip = false
      private_subnet_name = null
      private_subnet_network_security_group_association_id = null
      public_ip_name = "nat-gw-public-ip"
      public_subnet_name = null
      public_subnet_network_security_group_association_id = null
      storage_account_name = "dbstorageabcde1234"
      storage_account_sku_name = "Standard_GRS"
      virtual_network_id = null
      vnet_address_prefix = "10.139"
    }
  tags = {
    creator = "me"
  }
  lifecycle {
      ignore_changes = [
          public_network_access_enabled
      ]
  }
}

data "databricks_current_user" "me" {}

resource "databricks_sql_permissions" "database_test" {
  database = "test"

  privilege_assignments {
    principal  = "myuser@mydomain.com"
    privileges = ["USAGE"]
  }
}

resource "databricks_sql_permissions" "table_test_student" {
  database = "test"
  table = "student"

  privilege_assignments {
    principal  = "myuser@mydomain.com"
    privileges = ["SELECT", "MODIFY"]
  }
}

output "adb_id" {
    value = azurerm_databricks_workspace.adbtf.id
}

NOTE: Serge Smertin (Terraform Databricks maintainer) mentioned in GitHub issues: [DOC] databricks_sql_permissions Resource to be deprecated ? #1215 that the databricks_sql_permissions resource is deprecated but I could not find any indication about that in the docs, only a recommendation to use another resource when leveraging Unity Catalog which I'm not doing.

Triamus
  • 2,415
  • 5
  • 27
  • 37