Databricks Serverless Computer - writeback to delta tables

Question

Databricks Serverless Compute - I know this is still in preview and is by request and is only available on AWS.

Can this be used for Read and Write (Update) .delta tables [or] is it read-only?

And is it good to run small queries (transactional in nature)? [or] is it good to have Azure SQL for that?

Performance from Azure SQL (az sql) seems faster for small queries than Databricks.

As Databricks has to traverse through Hive Metastore when querying from .delta tables - will this impact the performance?

Well, based on my experience, I use Azure Databricks only for transforming the data when some syntax from the Synapse (I'm using Azure Synapse) is not available yet, e.g: `GROUP BY CUBE`. For the performance, I'm still recommended transforming the data on SQL (In here I'm using Azure Synapse) — MADFROST, Sep 13 '21 at 07:11
I'm using databricks for transforming the data using `GROUP BY CUBE` and write it to delta and I got the estimated time 10 hours, within while doing it on Azure Synapse only need 4 minutes. This still on my [Issue](https://stackoverflow.com/questions/69068536/how-to-increase-databricks-performance) — MADFROST, Sep 13 '21 at 07:12

score 0 · Answer 1 · answered Sep 10 '21 at 14:54

0

According to the Release Notes (June 17 2021), the new photon executor is switched on for SQL endpoints and it does also support writes to Delta tables (and to Parquet).

If you want to run a lot of small queries on a set of data, then I'd say Az SQL interactions (or operations on a SparkDataFrame taken from the Delta Table), should always outperform the same thing expressed in SQL running directly against a Delta Lake table, since the latter has to negotiate the versioned parquet files and the Delta Lake transaction log on your behalf.

answered Sep 10 '21 at 14:54

53epo

784
5
7

True, i got few use cases to acquire data from cloud sources (ex: salesforce, dynamics) and i need to host this configuration tables (watermark tables), which are purely for Azure Data Factory workflow management. These pipelines will just get data from source system and land into lake as .csv or .parquet. For this type of configuration i think Az SQL is best place, as there is no processing required (databricks) – Sreedhar Sep 15 '21 at 06:19

Databricks Serverless Computer - writeback to delta tables

1 Answers1