2

I have always been under the impression that the following code create a Delta table,

data.write.format("delta").save("/path/to/delta-table")

This creates the files, sure, however, I noticed today that when I look at the Data section of Databricks, under the hive_metastore, this table does not show up.

In order for this table to show up there, I have to do something like,

CREATE TABLE some_table USING DELTA LOCATION "/path/to/delta-table"

What exactly is going on here? Was I wrong in my understanding that the .write operation creates a table? What is the difference between these commands?

Minura Punchihewa
  • 1,498
  • 1
  • 12
  • 35

1 Answers1

4

DataFrameWriter has following methods:

def save(path: String): Unit

Saves the content of the DataFrame at the specified path.

def saveAsTable(tableName: String): Unit

Saves the content of the DataFrame as the specified table.

What you did by .save("/path/to/delta-table") was saving the data in delta format in the filesystem. In order for the table to be visible in data catalog (aka. metastore) you need to run CREATE TABLE providing the location.

You can write data using .saveAsTable("delta-table") - that would write the data under a path managed by the metastore and register the table in one step.

Kombajn zbożowy
  • 8,755
  • 3
  • 28
  • 60
  • thank you for your answer. I have a quick follow up question: if I execute a `CREATE TABLE` command on top of data that has been saved using `.save()`, does it affect the structure or nature of this data in any way? I ask this because I still have users who come in and directly run queries on top of the files. – Minura Punchihewa Nov 22 '22 at 14:30
  • 1
    It does not affect the structure. Regardless of whether the table is created on top of that or not you can read directly from the path with `SELECT * FROM delta.\`/path/to/delta-table\``. – Kombajn zbożowy Nov 22 '22 at 16:31
  • Thank you. I think you've answered my question perfectly. – Minura Punchihewa Nov 22 '22 at 17:35