Questions tagged [delta-live-tables]

Databricks Delta Live Tables (DLT) is the innovative ETL framework that uses a simple declarative approach to building reliable data pipelines and automatically managing your infrastructure at scale.

Delta Live Tables simplifies development of the reliable data pipelines in Python & SQL by providing a framework that automatically handles dependencies between components, enforces the data quality, removes administrative overhead with automatic cluster & data maintenance, ...

149 questions
0
votes
0 answers

Can we use Delta Live Tables with open source delta lake like with minIO Object Storage

Can we use Delta Live Tables with open source delta lake. Like currently we are using minIO Object Storage. I would like to know whether DLT be used for transformation in such cases. I am working on my RnD for my project's Data Architecture
0
votes
1 answer

Enzyme feature in databricks

I am not able to get more information on this feature to understand what it actually solves. Are there documentation to understand what enzyme actually does?
Rajib Deb
  • 1,496
  • 11
  • 30
0
votes
1 answer

Databricks truncate delta table restart identity 1

We are created SQL notebook in Databricks and we are trying to develop onetime script. we have to truncate and load the data every time and the table sequence id generated always start with 1. if we do truncate and load the data. the sequence of id…
0
votes
0 answers

delta live table expectations syntax looks different

I was looking at a code and saw that the expect_all is coded as below dlt.expect_all(dict_expectations)(dlt_quarantine_view) My understanding of the syntax was that it takes a dictionary of expectations and executes that, i am not able ti…
Rajib Deb
  • 1,496
  • 11
  • 30
0
votes
0 answers

How to alter delta live table(addition and deletion column) and run without full refresh

Want to alter delta live table (addition and deletion of column and changing length of column) and do not want to -process legacy data instead new data should be process with new changes Alter delta table manually and it resulted an error.
Deepak
  • 1
0
votes
1 answer

applyInPandas() aggregation runs slowly on big delta table

I'm trying to create a gold table notebook in Databricks, however it would take 9 days to fully reprocess the historical data (43GB, 35k parquet files). I tried scaling up the cluster but it doesn't go above 5000 records/second. The bottleneck seems…
0
votes
1 answer

spark streaming and delta tables: java.lang.UnsupportedOperationException: Detected a data update

The setup: Azure Event Hub -> raw delta table -> agg1 delta table -> agg2 delta table The data is processed by spark structured streaming. Updates on target delta tables are done via foreachBatch using merge. In the result I'm getting…
0
votes
1 answer

Databricks DLT Syntax for Read_Stream Union

New to DLT, struggling with the python syntax for returning a dataframe via the dlt.read_stream operator as a union (unionByName) of two other live tables. My pipeline is as follows.. WORKS: Table1: @dlt.table() def table_1() return spark.sql…
ExoV1
  • 97
  • 1
  • 7
0
votes
0 answers

Delta Merge Operation not always inserts/updates all of the records

This happens from time to time - this is the strange part My current Solution : Re-run the job ! :disappointed: - but this is very reactive - not happy3 This is how my merge stmt look like: MERGE INTO target_tbl AS Target USING df_source AS Source…
Up_One
  • 5,213
  • 3
  • 33
  • 65
0
votes
2 answers

How to use variable inside python logger

Is there any way to use variables in the python logger level instead of levels(error, info..)? I get event level from Delta Live Tables events level_log = event.level // this is from Delta Live Tables Events log_event.{level_log}(level, extra=extra)…
Jelena Ajdukovic
  • 311
  • 3
  • 12
0
votes
1 answer

PySpark problem flattening array with nested JSON and other elements

I'm struggling with the correct syntax to flatten some data. I have a dlt table with a column (named lorem for the sake of the example) where each row looks like this: [{"field1": {"field1_1": null, "field1_2": null}, "field2": "blabla",…
António Mendes
  • 173
  • 1
  • 10
-1
votes
0 answers

import deltaray library error - 'type' object is not subscriptable

I wonder if anyone can give me an advice on why I am getting this error when I install the deltaray library. I am trying to run their demo on Google Colab notebook. The installation seems to run fine and I restart the runtime as requested. Here's…
Juliana
  • 31
  • 4
-2
votes
0 answers

Need help to ingest data in azure databricks from kafka

I need help to ingest data in azure databricks sql warehouse database table from kafka in a batch job which needs to run every hour and only the new data in kafka should be synced in the databricks sql table. Please let me know how this can be…
1 2 3
9
10