Can someone explain this : "Spark SQL supports a different use case than Hive."

Question

I am referring to the following link : Hive Support for Spark

It says :

"Spark SQL supports a different use case than Hive."

I am not sure why that will be the case. Does this mean as a Hive user i cannot use Spark execution engine through Spark SQL?

Some Questions:

Spark SQL uses Hive Query parser. So it will ideally support all of Hive functionality.
Will it use Hive Metastore?
Will Hive use the Spark optimizer or will it build its own optimizer?
Will Hive translate MR Jobs into Spark? Or use some other paradigm?

score 1 · Accepted Answer · answered Aug 27 '14 at 21:47

1

Spark SQL is intended to allow the use of SQL expressions on top of Spark's machine learning libraries. It allows you to use SQL as a tool (among others) for building advanced analytic (eg ML) applications. It is not a drop-in replacement for Hive, which is really best at batch processing/ETL.

However, there is also work ongoing upstream to allow Spark to serve as a general data processing backend for Hive. That work is what would allow you to take full advantage of Spark for Hive use cases specifically.

answered Aug 27 '14 at 21:47

Justin Kestelyn

924
5
12

Thanks. A few questions: - Spark SQL uses Hive Query parser. So it will ideally support all of Hive functionality. Will it use Hive Metastore? - Will Hive use the Spark optimizer or will it build its own optimizer? - Will Hive translate MR Jobs into Spark? Or use some other paradigm? – Venkat Aug 28 '14 at 15:52
IIRC, Spark SQL will use the Hive Metastore yes, so you'll be able to run all Hive queries. As for Hive-on-Spark, see [this blog post](http://blog.cloudera.com/blog/2014/07/apache-hive-on-apache-spark-motivations-and-design-principles/), which describes design principles. – Justin Kestelyn Aug 28 '14 at 23:37

Can someone explain this : "Spark SQL supports a different use case than Hive."

1 Answers1