Questions tagged [delta-live-tables]

Databricks Delta Live Tables (DLT) is the innovative ETL framework that uses a simple declarative approach to building reliable data pipelines and automatically managing your infrastructure at scale.

Delta Live Tables simplifies development of the reliable data pipelines in Python & SQL by providing a framework that automatically handles dependencies between components, enforces the data quality, removes administrative overhead with automatic cluster & data maintenance, ...

149 questions
0
votes
1 answer

Different storage paths depending on checkout branch with Delta Live Tables

How can I change the storage location depending on what branch I am working on. For example, I'd like the storage location when running a DLT pipeline on my feature branch to be different from the storage location when running the pipeline on a…
Oliver Angelil
  • 1,099
  • 15
  • 31
0
votes
0 answers

How to deal with Unsync Tables in databricks notebooks

So, my question is: i have a bronze - silver injestion, and later this data will be consumed by gold notebooks, i wan't to check if all tables that notebooks uses is updated. What i thinked, use dlt expectations and in my golds, i will create…
0
votes
0 answers

Where in Hive Metastore is the s3 locations of Databricks (Spark) tables can be found?

I have a few Databricks clusters, some share a single Hive Metastore (HMS), call them PROD_CLUSTERS, and an adidiotnal cluster, ADHOC_CLUSTER, which has its own HMS.  All my data is stored in S3, as Databricks delta tables: PROD_CLUSTERS have…
0
votes
0 answers

Databricks DLT driver timeout

Sometimes a continuously running DLT pipeline job will lose connection with the driver when it autoscales (enhanced autoscaling is enabled). The exact error message is: INTERNAL_ERROR: Communication lost with driver. Cluster xyz was not reachable…
kyrre
  • 626
  • 2
  • 9
  • 24
0
votes
0 answers

how to achieve update in a streaming table in Delta Live Tables?

I want to reflect insert and update of a table: CREATE OR REFRESH STREAMING LIVE TABLE dlt_stage2_bronze_streaming_students ; APPLY CHANGES INTO LIVE.dlt_stage2_bronze_streaming_students FROM stream(dlt_test.stage2_test_students_raw) KEYS (id) …
user3692015
  • 391
  • 4
  • 15
0
votes
0 answers

databricks error importing using cloudfiles

I am trying to import data using cloudfiles in databricks. 2 different ways give me errors. With CREATE OR REFRESH STREAMING LIVE TABLE orders_raw COMMENT "The raw books orders, ingested from orders-raw" AS SELECT * FROM…
gaut
  • 5,771
  • 1
  • 14
  • 45
0
votes
0 answers

Databricks Workflows: tasks of type 'Delta Live Tables pipeline' fail to retry

I have a DLT pipeline and I want to be able to call it from different workflows. I understand that when an update is already in progress the 'Delta Live Tables pipeline' task will fail. However, I expect it to retry again following the retries…
0
votes
1 answer

Same target for multiple DLT pipelines

Is it possible to configure two DLT pipelines to target same schema, having in mind that each will write to different tables? Can it produce some issues regarding consistency of metadata DLT pipelines writes?
partlov
  • 13,789
  • 6
  • 63
  • 82
0
votes
2 answers

Copy (in delta format) of an append-only incremental table that is in JDBC (SQL)

My ultimate goal is to have a copy (in delta format) of an append-only incremental table that is in JDBC (SQL). I have a batch process reading from the incremental append-only JDBC (SQL) table, with spark.read (since .readStream is not supported for…
0
votes
1 answer

How to make Delta Live Tables rollback all transactions in the pipeline when an expectation with a 'ON VIOLATION FAIL UPDATE' fails

I am building a Delta Live Tables pipeline and I need to perform complex quality checks on top of the data mart. Following the guidelines of DLT, this should be done in a separate temporary live table where the quality check logic is implemented as…
0
votes
2 answers

Copy of Incremental source table with Spark

A source table in an SQL DB increments (new rows) every second. I want to run some spark code (maybe with Structured Streaming?) once per day (it is okay if the copy is at most 1-day outdated), to append the new rows since the last time I ran the…
0
votes
1 answer

Autoloader VS dlt.read_stream VS dlt.create_streaming_live_table

I have two different use cases and I'm not sure which of the listed functions to use for each: we have a staging dataset that always contains only 1 day of data. Every day I want to append this staged data to an incremental table. I am working from…
Oliver Angelil
  • 1,099
  • 15
  • 31
0
votes
1 answer

How to load the data from a SQL Server table using Databricks delta live tables

I want to load the data from a SQL Server table using Databricks Delta live table and pass the value to another notebook
bigdata techie
  • 147
  • 1
  • 11
0
votes
0 answers

Databricks SQL, can't escape backticks ( ` ) in column names when ingesting .csv

I'm working on ingesting raw .csv files stored in an ADLS storage account into Bronze layer Delta live tables in Databricks. The challenge arises from the column names in the CSV files, which include spaces and backticks ( ` ). Unfortunately, since…
Ben S
  • 1
  • 1
0
votes
1 answer

Databricks DLT pipeline to overwrites data in target table instead of appending data to existing tables,

We have kafka topics to processed everyday as a batch process. When the pipeline is triggered, initially we load the kafka data as is to an ADLS location (Landing). The data recieved on Kafka is all CDC data which will have KEY, VALUE, OFFSET,…
Yuva
  • 2,831
  • 7
  • 36
  • 60