Optimize blob storage Deltalake using local scope table on Azure Databricks

Question

How can you optimize an Azure blob storage delta table on Azure Databricks, while not putting the table to a global scope? Optimizing and z-ordering a delta table on an Azure blob storage can be done via (cf. docs):

spark.sql('DROP TABLE IF EXISTS T')
spark.sql("CREATE TABLE T USING DELTA LOCATION
          'wasbs://fs@accname.blob.core.windows.net/path/to/df'"
spark.sql('OPTIMIZE T ZORDER BY (colname)')
spark.sql('DROP TABLE IF EXISTS T')

However, the table T has a global scope, such that this command fails, if other users have already labeled a table with the name T.

A possible solution might be the following, but is this the easiest way (why are ` needed and not ')?

spark.sql("OPTIMIZE delta.`wasbs://fs@accname.blob.core.windows.net/path/to/df`
           ZORDER BY (colname)")

score 1 · Accepted Answer · answered Oct 22 '20 at 22:16

Two thoughts:

You can & should scope the table to a database. The example above has 'default' as the database name. Just use MY_DB as an example:

spark.sql("CREATE TABLE MY_DB.T USING DELTA LOCATION
      'wasbs://fs@accname.blob.core.windows.net/path/to/df'"

Yes, your suggestion is also correct. The back tics are a weird spark-ism for specifying the LOCATION property of a data set in a select clause.

Optimize blob storage Deltalake using local scope table on Azure Databricks

1 Answers1