I am new to Delta Live Tables and have been working with a relatively simple pipeline.
The table that I am having an issue is as follows:
@dlt.table(
table_properties={ "quality" : "silver" }
)
def silver_catalog_product():
eav_attributes = dlt.read("eav_attribute");
entityDf = dlt.read("product_entity")
entity_datetime_Df = dlt.read("product_entity_datetime")
entity_datetime_Df = entity_datetime_Df.join(eav_attributes,entity_datetime_Df.attribute_id == eav_attributes.attribute_id,"inner")
entity_datetime_Df = entity_datetime_Df.groupBy("entity_id").pivot("attribute_code").agg(first("value"))
df = (entityDf
.join(entity_datetime_Df, entityDf.entity_id == entity_datetime_Df.entity_id, "inner")
.drop(entity_datetime_Df.entity_id)
)
return df
However, when the pipeline runs I get the following error:
To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
'.option("mergeSchema", "true")'.
For other operations, set the session configuration
spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
specific to the operation for details.
Table schema:
root
-- entity_id: long (nullable = true)
-- entity_type_id: integer (nullable = true)
-- attribute_set_id: integer (nullable = true)
-- type_id: string (nullable = true)
-- sku: string (nullable = true)
-- created_at: timestamp (nullable = true)
-- updated_at: timestamp (nullable = true)
-- has_options: integer (nullable = true)
-- required_options: integer (nullable = true)
Data schema:
root
-- entity_id: long (nullable = true)
-- entity_type_id: integer (nullable = true)
-- attribute_set_id: integer (nullable = true)
-- type_id: string (nullable = true)
-- sku: string (nullable = true)
-- created_at: timestamp (nullable = true)
-- updated_at: timestamp (nullable = true)
-- has_options: integer (nullable = true)
-- required_options: integer (nullable = true)
-- custom_design_from: timestamp (nullable = true)
-- custom_design_to: timestamp (nullable = true)
-- news_from_date: timestamp (nullable = true)
-- news_to_date: timestamp (nullable = true)
-- price_update_type_revert_date: timestamp (nullable = true)
-- special_from_date: timestamp (nullable = true)
-- special_to_date: timestamp (nullable = true)
To overwrite your schema or change partitioning, please set:
'.option("overwriteSchema", "true")'.
Note that the schema can't be overwritten when using
'replaceWhere'.
I have two main questions.
- The suggestions indicate
.option("overwriteSchema", "true")
or.option("mergeSchema", "true")
. I am familiar with these options in a regular Delta pyspark job but I have no idea, nor can I find any documentation on how to enable 'overwriteSchema' in Delta Live Tables. - Why is the job inferring the schema as the
entityDf
table and not the actual dataframedf
being returned in the query?