1

I am referring to the following link : Hive Support for Spark

It says :

"Spark SQL supports a different use case than Hive."

I am not sure why that will be the case. Does this mean as a Hive user i cannot use Spark execution engine through Spark SQL?

Some Questions:

  • Spark SQL uses Hive Query parser. So it will ideally support all of Hive functionality.
  • Will it use Hive Metastore?
  • Will Hive use the Spark optimizer or will it build its own optimizer?
  • Will Hive translate MR Jobs into Spark? Or use some other paradigm?
Venkat
  • 1,810
  • 1
  • 11
  • 14

1 Answers1

1

Spark SQL is intended to allow the use of SQL expressions on top of Spark's machine learning libraries. It allows you to use SQL as a tool (among others) for building advanced analytic (eg ML) applications. It is not a drop-in replacement for Hive, which is really best at batch processing/ETL.

However, there is also work ongoing upstream to allow Spark to serve as a general data processing backend for Hive. That work is what would allow you to take full advantage of Spark for Hive use cases specifically.

Justin Kestelyn
  • 924
  • 5
  • 12
  • Thanks. A few questions: - Spark SQL uses Hive Query parser. So it will ideally support all of Hive functionality. Will it use Hive Metastore? - Will Hive use the Spark optimizer or will it build its own optimizer? - Will Hive translate MR Jobs into Spark? Or use some other paradigm? – Venkat Aug 28 '14 at 15:52
  • IIRC, Spark SQL will use the Hive Metastore yes, so you'll be able to run all Hive queries. As for Hive-on-Spark, see [this blog post](http://blog.cloudera.com/blog/2014/07/apache-hive-on-apache-spark-motivations-and-design-principles/), which describes design principles. – Justin Kestelyn Aug 28 '14 at 23:37