Questions tagged [delta-live-tables]

Databricks Delta Live Tables (DLT) is the innovative ETL framework that uses a simple declarative approach to building reliable data pipelines and automatically managing your infrastructure at scale.

Delta Live Tables simplifies development of the reliable data pipelines in Python & SQL by providing a framework that automatically handles dependencies between components, enforces the data quality, removes administrative overhead with automatic cluster & data maintenance, ...

149 questions
2
votes
0 answers

Overwrite Scheme on Delta Live Tables workflow

I am new to Delta Live Tables and have been working with a relatively simple pipeline. The table that I am having an issue is as follows: @dlt.table( table_properties={ "quality" : "silver" } ) def silver_catalog_product(): …
Oliver
  • 35,233
  • 12
  • 66
  • 78
2
votes
1 answer

Delta Live Table able to write to ADLS?

I have a architectural requirement to have the data stored in ADLS under a medallion model, and are trying to achieve writing to ADLS using Delta Live Tables as a precursor to creating the Delta Table. I've had had success using CREATE TABLE…
2
votes
1 answer

Use DLT table from one pipeline in another pipeline

If I have a DLT pipeline that creates a streaming live table called customers, how can I use that table in another pipeline? So, Pipeline A: CREATE OR REFRESH STREAMING LIVE TABLE customers AS Pipeline B: CREATE OR REFRESH STREAMING LIVE TABLE…
AndyMN
  • 41
  • 2
2
votes
0 answers

Incrementallly reading and aggregating parquet files from S3 using Databricks DLT

I am trying to use DLT for incremental processing where inputs are parquet files on s3 arriving daily. I am told that dlt read_stream can help . I was able to get incrementally read files, but when I perform aggregations, it is doing wide…
2
votes
1 answer

How can I control the order of Databricks Delta Live Tables' (DLT) creation for pipeline development?

I am developing a Databricks Pipeline, writing my DLTs in Python. I want to understand how to control the Pipeline's order of creation of DLTs. Currently, the Pipeline attempts to create every single DLT in the order that they're written in,…
JJ Kam
  • 91
  • 7
2
votes
3 answers

Delta live tables - Slowly changing dimensions

Is it possible to create an Slowly Changing Dimension mechanism using Delta Live Tables? I would like to implement something like this https://docs.databricks.com/_static/notebooks/merge-in-scd-type-2.html But in the DLT docs i found "Processing…
2
votes
1 answer

Databricks - Read Streams - Delta Live Tables

I have a number of tables (with varying degrees of differences in schemas but with a common set of fields) that I would like to Union and load from bronze -> Silver in an incremental manner. So the goal is to go from multiple tables to a single…
Trista_456
  • 117
  • 10
1
vote
1 answer

Move managed DLT table from one schema to another schema in Databricks

I have a DLT table in schema A which is being loaded by DLT pipeline. I want to move the table from schema A to schema B, and repoint my existing DLT pipeline to table in schema B. also I need to avoid full reload in DLT pipeline on table in Schema…
Athi
  • 347
  • 4
  • 12
1
vote
1 answer

Limited options for source code path for Delta Live Tables (DLT)

When I run jobs, I can point to a file on Github or Azure Devops, and specify the branch I want the job to read from. However when I create a DLT pipeline, I can point only to files on Databricks, and I cannot specify a branch. Pointing to a shared…
Oliver Angelil
  • 1,099
  • 15
  • 31
1
vote
1 answer

How to obtain a direct way to differentiate between a full refresh and an incremental update for Delta live table?

I have a tables that travels from Bronze - silver - gold, I want to implement some function like 'is_full_refresh()' so the pipeline filters the df depending on the output, if it's a full, don't filter, if it's incremental filter by a,b,c Checking…
1
vote
1 answer

How to find namespace of tables in Delta Live Tables to query?

I created a pipeline using Delta Live Tables. How to know the namespace of the tables? The name of this DLT pipeline is "dlt_test", then I tried select * from dlt_test.live_gold select * from dlt_test_dlt_db.live_gold However, both failed and…
user3692015
  • 391
  • 4
  • 15
1
vote
0 answers

Incremental ingestion of Snowflake data with Delta Live Table (CDC)

I have some data which are lying into Snowflake, so I want to apply CDC on them using delta live table but I am having some issues. Here is what I am trying to do: @dlt.view() def table1(): return…
1
vote
0 answers

How to capture dropped events in PySpark structural streaming job

I have a PySpark streaming job which drops duplicate events by a session id. I have a watermarking window of 30 min. Snippet: unique_df = df.withColumn("timestamp", current_timestamp()).dropDuplicates(session_id).withWatermarking("timestamp", 30) I…
1
vote
1 answer

Effect of "table_properties" property for Delta Live Tables

I have the following code: @dlt.table( name="ingested_data", comment="Ingest the table", table_properties={ "quality": "raw", "name": "property_name" } ) I am confused what the table_properties dictionary does in practice? I…
Oliver Angelil
  • 1,099
  • 15
  • 31
1
vote
1 answer

Databricks Delta Live Tables (DLT) file format (notebooks or .py files?)

I noticed that it is possible to write DLT pipelines in both Databricks notebooks and .py files. Is there a recommended approach?
Oliver Angelil
  • 1,099
  • 15
  • 31
1 2
3
9 10