I am running the following commands in a Databricks Notebook, I get an error at COMMAND 6. The only thing I can think of is somewhow I have not set up the dataframe correctly at the start, but I have specified it as delta.
COMMAND 1:
sales_df = spark.read.parquet(f"{DA.paths.datasets}/ecommerce/sales/sales.parquet")
delta_sales_path = f"{DA.paths.working_dir}/delta-sales"
COMMAND 2:
sales_df.write.format("delta").save(delta_sales_path)
COMMAND 3:
from pyspark.sql.functions import size, col
updated_sales_df = sales_df.withColumn("items_size", size(col("items"))).drop("items").withColumn("items", col("items_size").cast("int"))
updated_sales_df = updated_sales_df.drop("items_size")
display(updated_sales_df)
COMMAND 4:
updated_sales_df.write.format("delta").mode("overwrite").option("overwriteSchema", True).save(delta_sales_path)
COMMAND 5:
spark.sql("""
DROP TABLE IF EXISTS sales_delta;
""")
COMMAND 6:
spark.sql("""
CREATE TABLE sales_delta
USING delta
LOCATION 'delta_sales_path';
""")
I am getting the following error:
You are trying to create an external table `hive_metastore`.`class_101_m7s0_da_asp`.`sales_delta`
from `dbfs:/user/hive/warehouse/delta_sales_path` using Delta, but there is no transaction log present at
`dbfs:/user/hive/warehouse/delta_sales_path/_delta_log`. Check the upstream job to make sure that it is writing using
format("delta") and that the path is the root of the table.