Questions tagged [delta-live-tables]

Databricks Delta Live Tables (DLT) is the innovative ETL framework that uses a simple declarative approach to building reliable data pipelines and automatically managing your infrastructure at scale.

Delta Live Tables simplifies development of the reliable data pipelines in Python & SQL by providing a framework that automatically handles dependencies between components, enforces the data quality, removes administrative overhead with automatic cluster & data maintenance, ...

149 questions
0
votes
0 answers

Custom logging in Databricks delta live tables, dlt

I am using dlt python in one of our ETL pipelines, and has kafka topics to be processed using delta live tables. Since when running the DLT pipelines, we couldnt print any log / status messages, I tried using custom loggin using logging library. …
Yuva
  • 2,831
  • 7
  • 36
  • 60
0
votes
0 answers

DLT - merge/apply_changes using partition pruning

Is there any way we can scan the target table partitions and apply merge in only those partitions using delta live table? I have a dlt (delta live table) table created with partitions (reference :…
anky
  • 74,114
  • 11
  • 41
  • 70
0
votes
0 answers

Can I automate and pass dynamic parameters to a Delta Live Table pipeline at running time?

I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a dynamic input parameter to apply filter in databricks. I know settings in the pipeline that we use in the DLT notebook, but it seems we can only…
0
votes
1 answer

Difference between Structured Streaming and Delta Live Tables in Databricks

I'm interested in what is the difference between Structured Streaming and Delta Live Tables. Databricks said For most streaming or incremental data processing or ETL tasks, Databricks recommends Delta Live Tables. Does it mean I should always stick…
0
votes
0 answers

delta live tables aggregations in gold layer

In our DLT gold layer we have some aggregation queries that are live so it computes the whole thing. We would like to make this quicker and use CDF for business level aggregates like below https://www.databricks.com/notebooks/delta-lake-cdf.html We…
0
votes
0 answers

How to resolve the error "the spark driver has stopped unexpectedly and is restarting" when converting PySpark Dataframe to H2O Frame

I recently started exploring the field of Data Engineering and came across some difficulties. I have a bucket in GCS with millions of parquet files and I want to create an Anomaly Detection model with them. I was trying to ingest that data into…
drupal2me
  • 69
  • 4
0
votes
1 answer

parsing nexted json in databricks delta live tables

1st Question => Can we parse a nested JSON through SQL notebook and load the same in a delta live table ? 2nd Question => I am able to parse nested JSON using python notebook and able to print that. But can I load the data in a delta live table from…
Koushik Chandra
  • 1,565
  • 12
  • 37
  • 73
0
votes
1 answer

Databricks DeltaLivetable Silver not refreshing

we have a data import running from sql server into delta live tables. synapse imports data into landing zone. databricks then into bronze and silver tables. we use cdc processing version one as described…
marritza
  • 22
  • 5
0
votes
0 answers

Duplication data from streams on merge in Delta Tables

I have a source table with say following data +----------------+---+--------+-----------------+---------+ |registrationDate| id|custName| email|eventName| +----------------+---+--------+-----------------+---------+ | 17-02-2023| 2|…
0
votes
1 answer

Databricks DLT reading a table from one schema(bronze), process CDC data and store to another schema (processed)

I am developing an ETL pipeline using databricks DLT pipelines for CDC data that I recieve from kafka. I have created 2 pipelines successfully for landing, and raw zone. The raw one will have operation flag, a sequence column, and I would like to…
Yuva
  • 2,831
  • 7
  • 36
  • 60
0
votes
0 answers

How to add a validation on delta table column dynamically?

I'm working on a transformation and stuck with a common problem. Any assist is well appreciated. Scenario: Step-1: Reading from a delta table. +--------+------------------+ | emp_id | str | +--------+------------------+ | 1 |…
0
votes
0 answers

Trying to write and read delta table in the same pyspark structured streaming job. Can't see data

Is it possible for a PySpark job to write in a delta table and also read from the same in the same code? Here is what I'm trying to do. Problem statement: I'm having trouble printing the data on the console to see what is flowing. from pyspark.sql…
0
votes
1 answer

PySpark structured streaming read Kafka to delta table

Exploring PySpark Structured Streaming and databrick. I want to write a spark structural streaming job to read all the data from a kafka topic and publish to delta tables. Let's assume I'm using latest version and kafka has following details. kafka…
0
votes
1 answer

Creating a delta live table for reading incremental data

I have a table "x" which contains raw data in bronze layer. I have another table "Y" which is in silver layer and contains the transformed data. Now the incremental data is coming in the table x and I want to merge the incremental data from table x…
0
votes
0 answers

Selectively overwrite partitions in Delta Live Table pipeline

I have relatively big table and it is overwritten in DLT pipeline. It is partitioned by date and in most cases I change small portion of data (connected to last couple of partitions). Is it possible to selectively overwrite only specified…
partlov
  • 13,789
  • 6
  • 63
  • 82
1 2 3
9
10