Mount external and partitioned delta table in Databricks

Question

I have an external delta table partitioned by year in my storage, so when I try to "mount" the delta in Databricks table using this code:

CREATE TABLE if not exists profitek.products
                   USING DELTA                 
                   LOCATION 'dbfs:/mnt/curated-l1/profitek/products'

I get this error:

You are trying to create an external table spark_catalog.general.products from dbfs:/mnt/curated-l1/general/products using Databricks Delta, but there is no transaction log present at dbfs:/mnt/curated-l1/general/products/_delta_log. Check the upstream job to make sure that it is writing using format("delta") and that the path is the root of the table.

The _delta_log file is inside every partition

products
│
├── 2022-01-01
│   ├── _delta_log
│   ├── part_01.parquet
│   └── part_02.parquet
├── 2022-01-02
    ├── _delta_log
    ├── part_01.parquet
    └── part_02.parquet

So I can mount a PART of the table using something like this

CREATE TABLE if not exists profitek.products
                   USING DELTA                 
                   LOCATION 'dbfs:/mnt/curated-l1/profitek/products/2022-01-01'

How can I "mount" the full table, with all his partitions??

hi @vAndrés Bustamante, my questions is, are you want to mount 2022-01-01 and 2022-01-02 location at time ? — B. B. Naga Sai Vamsi, Feb 28 '23 at 01:16

score 1 · Accepted Answer · answered Feb 28 '23 at 04:41

Your delta table is not partitioned using delta's partitioning capabilities: df.write.format("delta").partitionBy(<date_column_name>).save(path)

If done in the way described above, the directory structure would look like this:

products
├── _delta_log
│   └── 0000000.json
├── 2022-01-01
│   ├── part_01.parquet
│   └── part_02.parquet
├── 2022-01-02
    ├── part_01.parquet
    └── part_02.parquet

As per the directory structure provided by you, it seems the delta tables were partitioned explicitly, i.e.: the partitions were explicitly written to those locations. Something like:

df_2022_01_01.write.format("delta").save("products/2022-01-01")
df_2022_01_02.write.format("delta").save("products/2022-01-02")

To get all the partitions in one df, you need to read individual ones and simply union them (assuming schema is same):

df_2022_01_01 = spark.read.format("delta").load("products/2022-01-01")
df_2022_01_02 = spark.read.format("delta").load("products/2022-01-02")
//then union the two dfs
df = df_2022_01_01.union(df_2022_01_02)

I've shown for 2 dfs only, you can use this approach to union dataframes in a for-loop.

Now that you have complete data in the df, you can partition it using delta for easier access later on: df.write.format("delta").partitionBy(<date_column_name>).save("/products/complete-delta")

Mount external and partitioned delta table in Databricks

1 Answers1