1

I have recieved a requirement. The data is incrementally copied to Bronze layer live table. Once the data is in bronze layer need to apply the data quality checks and final data need to be loaded into silver live table. I don’t have idea on this.

Could anyone please help me how to write the code using PySpark in databricks

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Venkatesh
  • 91
  • 1
  • 9

2 Answers2

1

You need to follow the DLT Python tutorial.

  • Declare a live table for your bronze layer using Auto Loader or other source type:
@dlt.table
def bronze():
  df = spark.readStream.format("cloudFiles")...load(input_path)
  • Declare the silver layer that will perform data transformation and enforce data quality checks using the expectations:
@dlt.table
@dlt.expect_or_drop("col1_not_null", "col1 is not null")
def silver():
  df = dlt.read_stream("bronze")
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • i have to write the data validations dynamically.The column names need to pass dynamically in the code . – Venkatesh Apr 28 '23 at 11:57
  • they could be dynamic: https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations#make-expectations-portable-and-reusable – Alex Ott Apr 28 '23 at 12:07
0

you can refer the databricks documentation as the task seems to be basic.

For ingestion into bronze layer - Autoloader

For bronze layer to silver layer(applying constraints)-https://learn.microsoft.com/en-us/azure/databricks/delta-live-tables/expectations