Azure Data Factory DataFlow Sink to Delta fails when schema changes (column data types and similar)

Question

We have an Azure Data Factory dataflow, it will sink into Delta. We have Owerwrite, Allow Insert options set and Vacuum = 1. When we run the pipeline over and over with no change in the table structure pipeline is successfull. But when the table structure being sinked changed, ex data types changed and such the pipeline fails with below error.

Error code: DFExecutorUserError Failure type: User configuration issue

Details: Job failed due to reason: at Sink 'ConvertToDelta': Job aborted.

We tried setting Vacuum to 0 and back, Merge Schema set and now, instead of Overwrite Truncate and back and forth, pipeline still failed.

score 0 · Answer 1 · answered Nov 09 '22 at 04:48

0

Can you try enabling Delta Lake's schema evolution (more information)? By default, Delta Lake has schema enforcement enabled which means that the change to the source table is not allowed which would result in an error.

Even with overwrite enabled, unless you specify schema evolution, overwrite will fail because by default the schema cannot be changed.

answered Nov 09 '22 at 04:48

Denny Lee

3,154
1
20
33

Can I enable Delta Lake schema evolution from within Azure Data Factory? – GVFLUSA Nov 09 '22 at 14:49

score 0 · Answer 2 · answered Nov 09 '22 at 11:43

I created ADLS Gen2 storage account and created input and output folders and uploaded parquet file into input folder. I created pipeline and created dataflow as below:

enter image description here

I have taken Parquet file as source. Dataflow Source:

enter image description here

Dataset of Source:

enter image description here

Data preview of Source:

enter image description here

I created derived column to change the structure of the table. Derived column:

enter image description here

I updated 'difficulty' column of parquet file. I changed the datatype of 'difficulty' column from long to double using below code:

difficulty : toDouble(difficulty)

Image for reference:

enter image description here

I updated 'transactions_len' column of parquet file. I changed the datatype of 'transactions_len' column from Integer to float using below code:

transactions_len : toFloat(transactions_len)

I updated 'number' column of parquet file. I changed the datatype of 'number' column from long to string using below code:

number : toString(number)

Image for reference:

enter image description here

Data preview of Derived column:

enter image description here

I have taken delta as sink. Dataflow sink:

enter image description here

Sink settings:

enter image description here

Data preview of Sink:

enter image description here

I run the pipeline It executed successfully.

Image for reference:

enter image description here

I t successfully stored in my storage account output folder.

Image for reference:

enter image description here

The procedure worked in my machine please recheck from your end.

When I changed data types to incompatible (ex from string to int or from datetime to int or similar) we get an error. See sample before or after tables below CREATE TABLE [dbo].[TableBefore]( [name] int NOT NULL, [schema_id] int NOT NULL, [create_date] int NOT NULL, [modify_date] int NOT NULL, [new_column] varchar(6) NOT NULL ) CREATE TABLE [dbo].[TableAfter]( [name] nvarchar(128) NOT NULL, [create_date] datetime NOT NULL, [modify_date] datetime NOT NULL, [new_column] int NOT NULL ) — GVFLUSA, Nov 09 '22 at 14:42
Could you please provide information about whether the source table and destination table is same? — Bhavani, Nov 10 '22 at 06:53

score 0 · Answer 3 · answered Nov 10 '22 at 15:39

0

The source (Ingestion) was generated to azure blob with given a specific filename. Whenever we generated to source parquet files without specifying a specific filename but only a directory the sink worked

answered Nov 10 '22 at 15:39

GVFLUSA

25
4

As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Nov 15 '22 at 06:48

Azure Data Factory DataFlow Sink to Delta fails when schema changes (column data types and similar)

3 Answers3