0

Databricks Serverless Compute - I know this is still in preview and is by request and is only available on AWS.

Can this be used for Read and Write (Update) .delta tables [or] is it read-only?

And is it good to run small queries (transactional in nature)? [or] is it good to have Azure SQL for that?

Performance from Azure SQL (az sql) seems faster for small queries than Databricks.

As Databricks has to traverse through Hive Metastore when querying from .delta tables - will this impact the performance?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
Sreedhar
  • 29,307
  • 34
  • 118
  • 188
  • Well, based on my experience, I use Azure Databricks only for transforming the data when some syntax from the Synapse (I'm using Azure Synapse) is not available yet, e.g: `GROUP BY CUBE`. For the performance, I'm still recommended transforming the data on SQL (In here I'm using Azure Synapse) – MADFROST Sep 13 '21 at 07:11
  • I'm using databricks for transforming the data using `GROUP BY CUBE` and write it to delta and I got the estimated time 10 hours, within while doing it on Azure Synapse only need 4 minutes. This still on my [Issue](https://stackoverflow.com/questions/69068536/how-to-increase-databricks-performance) – MADFROST Sep 13 '21 at 07:12

1 Answers1

0

According to the Release Notes (June 17 2021), the new photon executor is switched on for SQL endpoints and it does also support writes to Delta tables (and to Parquet).

If you want to run a lot of small queries on a set of data, then I'd say Az SQL interactions (or operations on a SparkDataFrame taken from the Delta Table), should always outperform the same thing expressed in SQL running directly against a Delta Lake table, since the latter has to negotiate the versioned parquet files and the Delta Lake transaction log on your behalf.

53epo
  • 784
  • 5
  • 7
  • True, i got few use cases to acquire data from cloud sources (ex: salesforce, dynamics) and i need to host this configuration tables (watermark tables), which are purely for Azure Data Factory workflow management. These pipelines will just get data from source system and land into lake as .csv or .parquet. For this type of configuration i think Az SQL is best place, as there is no processing required (databricks) – Sreedhar Sep 15 '21 at 06:19