Questions tagged [delta-live-tables]

Databricks Delta Live Tables (DLT) is the innovative ETL framework that uses a simple declarative approach to building reliable data pipelines and automatically managing your infrastructure at scale.

Delta Live Tables simplifies development of the reliable data pipelines in Python & SQL by providing a framework that automatically handles dependencies between components, enforces the data quality, removes administrative overhead with automatic cluster & data maintenance, ...

149 questions
1
vote
0 answers

converting pyspark dataframe column values to list in DLT

How can we convert pyspark dataframe column values to list in DLT I tried using collect() , toPandas(),collect_list(), toLocalIterator to converst dataframe df_year(has a column with year data only) to list but it is not supported in dlt pipeline.…
1
vote
1 answer

how to create pipeline in databricks using delta live table to read data from kafka

read from kafka this worked raw_kafka_test = (spark.readStream .etc ) @dlt.table( table_properties={"pipelines.reset.allowed":"false"} ) def raw_kafka(): return raw_kafka_test read from delta live table not worked @dlt.table( …
1
vote
2 answers

Delta live tables data validation in databricks

I have recieved a requirement. The data is incrementally copied to Bronze layer live table. Once the data is in bronze layer need to apply the data quality checks and final data need to be loaded into silver live table. I don’t have idea on…
Venkatesh
  • 91
  • 1
  • 9
1
vote
1 answer

Triggering a Databricks Delta Live Table from Azure Data Factory resets the whole tables. How do I disable that?

I have created a pipeline in Azure Data Factory that triggers a Delta Live Table in Azure Databricks through a Web activity mentioned here in the Microsoft documentation. My problem is that when I trigger my DLT from ADF, it resets the whole tables,…
1
vote
1 answer

Check pointing for delta live table using triggered mode

I have a use case where I need to run the delta live table on a triggered mode and would like to know if we have any capabilities around checkpointing in triggered mode. My source is a streaming one where data gets filled at second granularity and I…
Shane
  • 588
  • 6
  • 20
1
vote
1 answer

How to know DLT pipeline run status (failed, completed) using REST API?

Is there a way to find out DLT pipeline run status like if pipeline failed or succeeded ? I was looking into the API https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-api-guide.html#pipelinestateinfo State in below doesn't…
Chhaya Vishwakarma
  • 1,407
  • 9
  • 44
  • 72
1
vote
1 answer

databricks sql watermark syntax

Need some help regarding watermark syntax with DLT sql pipeline setup. Wanted to load combined data from 2 silver layer steaming table into a single table with watermarking so it can capture late updates but having some syntax error. SQL query to…
suki adhi
  • 11
  • 2
1
vote
2 answers

How to get the checkpoint location of delta live table?

Suppose you already used checkpoint to update the delta table(external table) with Autoloader. How can I find out its checkpoint location? I tried running the code below, but it didn't work in my environment. SELECT * FROM sys.tables WHERE name LIKE…
1
vote
1 answer

Azure Databricks Delta Live Table stored as SCD 2 is creating new records when no data changes

I have a streaming pipeline that ingests json files from a data lake. These files are dumped there periodically. Mostly the files contain duplicate data, but there are occasional changes. I am trying to process these files into a data warehouse…
1
vote
1 answer

Databricks DLT pipeline Error "AnalysisException: Cannot redefine dataset"

I am getting this error "AnalysisException: Cannot redefine dataset" in my DLT pipeline. I am using a for loop to trigger multiple flows. I am trying to load different sources into the same target using dlt.create_target_table and dlt.apply_changes.…
BobGally
  • 11
  • 2
1
vote
1 answer

DLT Stream Error - Queries with streaming sources must be executed with writeStream.start();

I'm trying to parse incoming variable length stream records in databricks using Delta Live Tables. I'm getting the error: Queries with streaming sources must be executed with writeStream.start(); Notebook code @dlt.table ( comment="xAudit…
J. Johnson
  • 11
  • 1
1
vote
1 answer

Databricks DLT pipeline with for..loop reports error "AnalysisException: Cannot redefine dataset"

I have the following code which works fine for a single table. But when I try to use a for..loop() to process all the tables in my database, I am getting the error, "AnalysisException: Cannot redefine dataset…
Yuva
  • 2,831
  • 7
  • 36
  • 60
1
vote
1 answer

Schema Changes not Allowed on Delta Live Tables Full Refresh

I have a simple Delta Live Tables pipeline that performs a streaming read of multiple csv files from cloudFiles (s3 storage) into a delta table published to the hive metastore. I have two requirements that make my situation more complex/unique: I…
1
vote
0 answers

How to use SCD type 1 using Delta Live Table

I am looking for real time example for applying SCD type 1 using Delta Live Table. I tried the reference from the official document from DLT but not able to get the exact answer.
Siddhu
  • 19
  • 4
1
vote
1 answer

How to configure path for Delta Live Table in cloud_files

I am new to the Databricks Delta Live table. I have some small doubts and need your help to understand the concept behind it. I am unable to proceed without this. I have a file in the Azure data lake container, and I know that I need to give the…