Questions tagged [delta-live-tables]

Databricks Delta Live Tables (DLT) is the innovative ETL framework that uses a simple declarative approach to building reliable data pipelines and automatically managing your infrastructure at scale.

Delta Live Tables simplifies development of the reliable data pipelines in Python & SQL by providing a framework that automatically handles dependencies between components, enforces the data quality, removes administrative overhead with automatic cluster & data maintenance, ...

149 questions
3
votes
1 answer

Databricks Delta Live Table - How To Simply Append A Batch Source To a DLT Table?

Using Python and all the relevant DLT properties within Databricks, does anyone know how to simple append to a DLT table from a batch source? In PySpark you can just use df.write.format("delta").mode("append") but since dlt requires you to return a…
Luke88
  • 33
  • 4
3
votes
1 answer

Creating a table in Pyspark within a Delta Live Table job in Databricks

I am running a DLT (Delta Live Table) Job that creates a Bronze table > Silver Table for two separate tables. So in the end, I have two separate gold Tables which I want to be merged into one table. I know how to do it in SQL but every time I run…
3
votes
1 answer

How to change partition columns in delta live tables?

I first setup a delta live tables using Python as follow @dlt.table def transaction(): return ( spark .readStream .format("cloudFiles") .schema(transaction_schema) .option("cloudFiles.format", "parquet") .load(path) …
Tse Kit Yam
  • 173
  • 8
3
votes
1 answer

How to use Apache Sedona on Databricks Delta Live tables?

I am trying to run some geospatial transformations in Delta Live Table, using Apache Sedona. I tried defining a minimal example pipeline demonstrating the problem I encounter. First cell of my Notebook, I install apache-sedona Python package: %pip…
2
votes
1 answer

How to read DeltaLake table using Pyspark

I have a deltalake table ( parquet format) in AWS S3 bucket. I need to read it in a dataframe using Pyspark in notebook code. I tried searching online but no success yet. Can anyone share sample code of how to read a deltalake table in Pyspark (…
2
votes
2 answers

How to set up authorization of Delta Live Tables to access Azure Data Lake files?

I am writing delta live tables notebooks in sql to access files from the data lake something like this: CREATE OR REFRESH STREAMING LIVE TABLE MyTable AS SELECT * FROM cloud_files("DataLakeSource/MyTableFiles", "parquet",…
FAA
  • 179
  • 11
2
votes
1 answer

How to WriteStream Delta live tables to a Kafka topic

In my DLP pipeline, I have three layers - bronze, silver, and gold. The bronze layer reads JSON files from an S3 bucket, while the silver layer performs data processing tasks such as adding new columns. The gold layer is responsible for performing…
2
votes
2 answers

create_streaming_live_table in DLT creates a VIEW instead of a delta table

I have the following piece of code and able to run as a DLT pipeline successfully @dlt.table( name = source_table ) def source_ds(): return spark.table(f"{raw_db_name}.{source_table}") ### Create the target table…
Yuva
  • 2,831
  • 7
  • 36
  • 60
2
votes
2 answers

How to make sure values are map to the right delta table column?

I'm writing a PySpark job to read the Values column from table1. Table1 has two column -> ID, Values Sample data in the Values column: +----+-----------------------------------+ | ID | values …
2
votes
1 answer

Truncate silver delta live table and reload

I have a parameter value which determines whether the table needs to be full load or an incremental load. In delta live tables, incremental load is not an issue as we apply changes and specify whether the table needs to be SCD1 or SCD2. However,…
RLH
  • 35
  • 4
2
votes
1 answer

How to separate Delta Live Tables production and development targets and repo branches?

I try to replicate some common data & analytics workflows using Delta Live Tables. Currently I am struggling with wrapping my head around on how to achieve below requirements: Have different targets (hive metastore) to write into based on dev or…
Michael Brenndoerfer
  • 3,483
  • 2
  • 39
  • 50
2
votes
1 answer

DLT notebook calls the same table definition multiple times

I have a dlt table defined in my DLT notebook that should run exactly once. However, it runs always a couple or more times. It is as simple as this. This gives me errors when defining other tables. Why? Is DLT parallelizing my function and that's…
2
votes
2 answers

SCD-2 using delta live table

Delta live table now has the capability to do SCD Type 2 changes. But after going through this feature, I understood that this will work if I have only one new row with a new effective date. In the scenario where I have two new rows with two…
Rajib Deb
  • 1,496
  • 11
  • 30
2
votes
0 answers

Delta Live Tables using SCD type 1

I'm trying to load data using DLT and SCD 1 and am running into the error message "Detected a data update in the source table at version x. This is currently not supported. If you'd like to ignore updates, set the option 'ignoreChanges' to…
AndyMN
  • 41
  • 2
2
votes
0 answers

Delta Live Tables and ingesting AVRO

So, im trying to load avro files in to dlt and create pipelines and so fourth. As a simple data frame in Databbricks, i can read and unpack to avro files, using functions json / rdd.map /lamba function. Where i can create a temp view then do a sql…
jo80
  • 21
  • 2
1
2
3
9 10