0

I am running the following commands in a Databricks Notebook, I get an error at COMMAND 6. The only thing I can think of is somewhow I have not set up the dataframe correctly at the start, but I have specified it as delta.

COMMAND 1:

sales_df = spark.read.parquet(f"{DA.paths.datasets}/ecommerce/sales/sales.parquet")
delta_sales_path = f"{DA.paths.working_dir}/delta-sales"

COMMAND 2:

sales_df.write.format("delta").save(delta_sales_path)

COMMAND 3:

from pyspark.sql.functions import size, col

updated_sales_df = sales_df.withColumn("items_size", size(col("items"))).drop("items").withColumn("items", col("items_size").cast("int"))
updated_sales_df = updated_sales_df.drop("items_size")

display(updated_sales_df)

COMMAND 4:

updated_sales_df.write.format("delta").mode("overwrite").option("overwriteSchema", True).save(delta_sales_path)

COMMAND 5:

spark.sql("""
DROP TABLE IF EXISTS sales_delta;
""")

COMMAND 6:

spark.sql("""
CREATE TABLE sales_delta
USING delta
LOCATION 'delta_sales_path';
""")

I am getting the following error:

You are trying to create an external table `hive_metastore`.`class_101_m7s0_da_asp`.`sales_delta`
from `dbfs:/user/hive/warehouse/delta_sales_path` using Delta, but there is no transaction log present at
`dbfs:/user/hive/warehouse/delta_sales_path/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.
Ram Nathan
  • 13
  • 4

1 Answers1

0

If there are data in your mentioned delta path (delta_sales_path), then it will create table with the statement you specified above.

But if there are no data in that delta path, then you are trying to create an empty table. In that case, you should specify create statement with schema also like below -

spark.sql("""
CREATE TABLE sales_delta
id int, name string
USING delta
LOCATION 'delta_sales_path';
""")

Also, if you want to create an empty table, please try using the below code -

df.createOrReplaceTempView('schema')
df_schema = spark.sql("select * from schema where 1=0")
df_schema.write.save(delta_path)