2

We are setting up a delta lake within Azure Synapse

Querying the delta lake works within Synapse. For that purpose, we set up a view

CREATE VIEW V_Test AS
SELECT
 *
FROM
    OPENROWSET(
        BULK 'https://XXXX.dfs.core.windows.net/gold/testtable',
        FORMAT = 'DELTA'
    ) AS [result]

When querying this view from within Synapse, results are returned as expected.

Now, we want to visualize the data in an application that only supports SQL Server Authentication. Trying to query this view from SQLServerManagementStudio works when logging in with AD authentication, but not when logging in with SQL server credentials

Msg 13807, Level 16, State 1, Procedure V_test, Line 4 [Batch Start Line 0]
Content of directory on path 'https://XXXX.dfs.core.windows.net/gold/testtable/_delta_log/*.json' cannot be listed.
Msg 4413, Level 16, State 1, Line 3
Could not use view or function 'V_test' because of binding errors.

Is there a way to make this work in a secure way when application only support SQL server credentials ?

Application -- Synapse Serverless pool -- Delta lake

Kermit754
  • 343
  • 5
  • 14

2 Answers2

1

In Synapse Studio you use the data in data lake with your interactive user identity. Query with Synapse SQL Serverless endpoint with SQL authentication use Synapse managed identity or other in view metadata level specified identity. I assume your own user account has proper permissions to source datalake in RBAC and ACL level, but the identity used when querying with serverless does not have ACL level permissions set correctly. Read and execute (r-x) permissions for the files in path /gold/testtable/** are needed.

Delta Lake table contains of multiple files. Delta Lake requires execute/list permissions in addition to read permissions to be able to see all files in folder. Make sure you have default ACL set correctly so that when the Delta Lake table is modified and new files added, also those have correct permissions.

More info for storage access permissions: https://learn.microsoft.com/en-us/azure/synapse-analytics/sql/develop-storage-files-storage-access-control?tabs=managed-identity.

More info on ACL permissions: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-access-control#how-to-set-acls

Veikko
  • 3,372
  • 2
  • 21
  • 31
  • Thanks, I got it to work with a SAS key - created on the storage account CREATE DATABASE SCOPED CREDENTIAL SasCredential WITH IDENTITY = 'SHARED ACCESS SIGNATURE', SECRET = '{SAS key}' CREATE EXTERNAL DATA SOURCE GoldDataLakeSASKey WITH ( LOCATION = 'https://XXXX.blob.core.windows.net/gold' ,CREDENTIAL = SasCredential ) CREATE VIEW V_test AS SELECT * FROM OPENROWSET( BULK 'testtable', DATA_SOURCE = 'GoldDataLakeSASKey', FORMAT = 'DELTA' ) AS [result] Next step is to try with a managed identity - so far had no luck – Kermit754 Jun 01 '22 at 11:21
  • 1
    With SAS you access the storage blob API and bypass the datalake capabilities and ACL. i would not recommend that for real data lake scenario. I don’t have exact performance metrics but I would assume you lose much of the performance with delta lake storage format because it utilizes the hierarchical namespace not available with blob. Even bigger problem for real data lake use is the lack of utilizing ACL for permissions. Your error message hints that you have MSI authentication and RBAC properly configured and only ACL level permissions not set correctly. Focus on those, then it works! – Veikko Jun 01 '22 at 14:40
1

Got it work in the following way :

Within the serverless SQL database, create an external source using a workspace identity

CREATE EXTERNAL DATA SOURCE GoldDataLakeWSIdentity
WITH (    LOCATION   = 'https://XXX.dfs.core.windows.net/gold'
,CREDENTIAL = WorkspaceIdentity 
)

and then create the view as follows

CREATE VIEW V_test AS
SELECT
 *
FROM
    OPENROWSET(
        BULK 'FolderWithinGoldContainer',
        DATA_SOURCE = 'GoldDataLakeWSIdentity',
        FORMAT = 'DELTA'
    ) AS [result]

In the datalake storage account, went to the specific folder -> Access control IAM and added the Azure synapse system assigned workspace identity as storage contributor role. From that moment, it worked as expected.

Kermit754
  • 343
  • 5
  • 14