-2

I'm looking at using both Databricks and Snowflake, connected by the Spark Connector, all running on AWS. I'm struggling to understand the following before triggering a decision:

  1. How well does the Spark Connector perform? (performance, extra costs, compatibility)
  2. What comparisons can be made between Databricks SQL and Snowflake SQL in terms of performance and standards?
  3. What have been the “gotchas” or unfortunate surprises about trying to use both?

1 Answers1

1

Snowflake has invested in the Spark connector's performance and according to benchmarks[0] it performs well.

The SQL dialects are similar. "Databricks SQL maintains compatibility with Apache Spark SQL semantics." [1] "Snowflake supports most of the commands and statements defined in SQL:1999." [2]

I haven't experienced gotchas. I would avoid using different regions. The performance characteristics of DataBricks SQL are different since 6/17 when they made their Photon engine default.

As always, the utility will depend on your use case, for example:

  • If you were doing analytical DataBricks SQL queries on partitioned compressed Parquet DeltaLake, then the performance ought to be roughly similar to Snowflake -- but if you were doing analytical DataBricks SQL queries against a JDBC MySQL connection then performance of Snowflake should be vastly better.
  • If you were doing wide table scan style queries (e.g. select * from foo (no where, no limit)) in DataBricks SQL and then doing analysis in a kernel (or something) then switching to Snowflake isn't going to do much for you.

etc

[0] - https://www.snowflake.com/blog/snowflake-connector-for-spark-version-2-6-turbocharges-reads-with-apache-arrow/

[1] - https://docs.databricks.com/sql/release-notes/index.html

[2] - https://docs.snowflake.com/en/sql-reference/intro-summary-sql.html

Nat Taylor
  • 1,122
  • 7
  • 9