1

How can you optimize an Azure blob storage delta table on Azure Databricks, while not putting the table to a global scope? Optimizing and z-ordering a delta table on an Azure blob storage can be done via (cf. docs):

spark.sql('DROP TABLE IF EXISTS T')
spark.sql("CREATE TABLE T USING DELTA LOCATION
          'wasbs://fs@accname.blob.core.windows.net/path/to/df'"
spark.sql('OPTIMIZE T ZORDER BY (colname)')
spark.sql('DROP TABLE IF EXISTS T')

However, the table T has a global scope, such that this command fails, if other users have already labeled a table with the name T.

A possible solution might be the following, but is this the easiest way (why are ` needed and not ')?

spark.sql("OPTIMIZE delta.`wasbs://fs@accname.blob.core.windows.net/path/to/df`
           ZORDER BY (colname)")
0vbb
  • 839
  • 11
  • 27

1 Answers1

1

Two thoughts:

  1. You can & should scope the table to a database. The example above has 'default' as the database name. Just use MY_DB as an example:
spark.sql("CREATE TABLE MY_DB.T USING DELTA LOCATION
      'wasbs://fs@accname.blob.core.windows.net/path/to/df'"
  1. Yes, your suggestion is also correct. The back tics are a weird spark-ism for specifying the LOCATION property of a data set in a select clause.
Douglas M
  • 1,035
  • 8
  • 17