2

I am new to Delta Live Tables and have been working with a relatively simple pipeline.

The table that I am having an issue is as follows:

@dlt.table(
    table_properties={ "quality" : "silver" }
)
def silver_catalog_product():
    
    eav_attributes = dlt.read("eav_attribute");
    
    entityDf = dlt.read("product_entity")   
    
    entity_datetime_Df = dlt.read("product_entity_datetime")  
    entity_datetime_Df = entity_datetime_Df.join(eav_attributes,entity_datetime_Df.attribute_id ==  eav_attributes.attribute_id,"inner")
    entity_datetime_Df = entity_datetime_Df.groupBy("entity_id").pivot("attribute_code").agg(first("value"))
    
    df = (entityDf
          .join(entity_datetime_Df, entityDf.entity_id == entity_datetime_Df.entity_id, "inner")
          .drop(entity_datetime_Df.entity_id)
     )
    
    return df

However, when the pipeline runs I get the following error:

To enable schema migration using DataFrameWriter or DataStreamWriter, please set:
'.option("mergeSchema", "true")'.
For other operations, set the session configuration
spark.databricks.delta.schema.autoMerge.enabled to "true". See the documentation
specific to the operation for details.

Table schema:
root
-- entity_id: long (nullable = true)
-- entity_type_id: integer (nullable = true)
-- attribute_set_id: integer (nullable = true)
-- type_id: string (nullable = true)
-- sku: string (nullable = true)
-- created_at: timestamp (nullable = true)
-- updated_at: timestamp (nullable = true)
-- has_options: integer (nullable = true)
-- required_options: integer (nullable = true)


Data schema:
root
-- entity_id: long (nullable = true)
-- entity_type_id: integer (nullable = true)
-- attribute_set_id: integer (nullable = true)
-- type_id: string (nullable = true)
-- sku: string (nullable = true)
-- created_at: timestamp (nullable = true)
-- updated_at: timestamp (nullable = true)
-- has_options: integer (nullable = true)
-- required_options: integer (nullable = true)
-- custom_design_from: timestamp (nullable = true)
-- custom_design_to: timestamp (nullable = true)
-- news_from_date: timestamp (nullable = true)
-- news_to_date: timestamp (nullable = true)
-- price_update_type_revert_date: timestamp (nullable = true)
-- special_from_date: timestamp (nullable = true)
-- special_to_date: timestamp (nullable = true)

         
To overwrite your schema or change partitioning, please set:
'.option("overwriteSchema", "true")'.

Note that the schema can't be overwritten when using
'replaceWhere'.

I have two main questions.

  1. The suggestions indicate .option("overwriteSchema", "true") or .option("mergeSchema", "true"). I am familiar with these options in a regular Delta pyspark job but I have no idea, nor can I find any documentation on how to enable 'overwriteSchema' in Delta Live Tables.
  2. Why is the job inferring the schema as the entityDf table and not the actual dataframe df being returned in the query?
Oliver
  • 35,233
  • 12
  • 66
  • 78
  • Hmmm, DLT should have mergeSchema enabled... What channel is used? `current`? – Alex Ott Sep 14 '22 at 13:45
  • @AlexOtt I haven't selected any preview features and I see no mention of preview so I can only assume it's the stable/current release. The workspace was created from their tooling to an AWS backplane. – Oliver Sep 14 '22 at 13:49
  • Yes I can confirm channel is set to Current – Oliver Sep 14 '22 at 14:42
  • I am running into the same issue. Did you find a way around it, other than deleting the dlt data (system, events) in storage, @Oliver? – Michael Brenndoerfer Oct 20 '22 at 02:09
  • I haven't and got shunted to ask the Databricks community with no feedback from our databricks representative. Unfortunately, other things came up and I never carried on trying to get to the bottom of it. – Oliver Oct 21 '22 at 08:45
  • We are having exact same issue with DLTs. Did anyone have any luck with it? – nee21 Mar 27 '23 at 17:01
  • This error occurs when you trying to write with a different schema. Just remember that the name of your table will be the name of the function as you haven't defined the name parameter. – Khalil Fall May 29 '23 at 19:29

0 Answers0