Delta live tables data validation in databricks

Question

I have recieved a requirement. The data is incrementally copied to Bronze layer live table. Once the data is in bronze layer need to apply the data quality checks and final data need to be loaded into silver live table. I don’t have idea on this.

Could anyone please help me how to write the code using PySpark in databricks

score 1 · Answer 1 · answered Apr 27 '23 at 09:49

1

You need to follow the DLT Python tutorial.

Declare a live table for your bronze layer using Auto Loader or other source type:

@dlt.table
def bronze():
  df = spark.readStream.format("cloudFiles")...load(input_path)

Declare the silver layer that will perform data transformation and enforce data quality checks using the expectations:

@dlt.table
@dlt.expect_or_drop("col1_not_null", "col1 is not null")
def silver():
  df = dlt.read_stream("bronze")

Create & run DLT pipeline

answered Apr 27 '23 at 09:49

Alex Ott

80,552
8
87
132

i have to write the data validations dynamically.The column names need to pass dynamically in the code . – Venkatesh Apr 28 '23 at 11:57
they could be dynamic: https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations#make-expectations-portable-and-reusable – Alex Ott Apr 28 '23 at 12:07

score 0 · Answer 2 · answered Apr 27 '23 at 05:23

you can refer the databricks documentation as the task seems to be basic.

For ingestion into bronze layer - Autoloader

For bronze layer to silver layer(applying constraints)-https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations

Delta live tables data validation in databricks

2 Answers2