KSQL ( Confluent ) VS Hive Kafka SQL ( Hortanworks )

Question

What are the difference ? which one is better ? when to use ?

Hive Kafka SQL

KSQL

OneCricketeer · Answer 1 · 2019-01-05T20:10:17.307

Installation

KSQL uses Kafka Streams, and does not depend on Hive, only Kafka and Zookeeper

Hive-Kakfa requires both Kafka, a HiveServer, and a RDBMS (MySQL, Postgres, etc).

EcoSystem

For external integrations, Hive-Kafka does not offer Confluent Avro Schema Registry integration. It might (eventually?) offer Hortonworks Schema Registry integration, though.

Hortonwork's suite of tools around NiFi, Spark, Kafka, SMM, Atlas, Ranger, Hive-Streaming, etc. are probably all well tested together.

Confluent partners with other companies to ensure proper integrations are met with such other tools than Kafka and their Platform.

Interface

AFAIK, Hive-Kafka is only a query engine, it will not create/maintain KStreams/KTable instances like KSQL, and will always require a scan of the Kafka topic. It also has no native REST interface for submitting queries, so the only option for external access would be JDBC/ODBC.

For a UI, Hive works well with HUE or Ambari Views which are both open-source but KSQL primarily only has Confluent Control Center, which is a paid-for solution.

"Better" is an opinion, and but if you already have Hive, I see no reason not to use Hive-Kafka.

IMO, KSQL can compliment Hive-Kafka by defining new topics as both tables and streams, as well as transforming/filtering Confleunt's Avro format into JSON that Hive-Kafka can natively understand. From there you can join existing Hive data (HDFS, S3, HBase, etc) with Hive-Kafka data, though, there will likely be performance impacts of that.

Similarly, you can take Hive-Kafka topics and translate them into Avro in KSQL using the Schema Registry, to use with other tools like Kafka Connect or NiFi to have a more efficient wire format (binary-avro vs. json).

And FWIW, look at the comments section of your first link

This integration is very different from KSQL.

The primary use case here is to allow users to actually unleash full SQL query use cases against any Kafka topic. https://github.com/apache/hive/tree/master/kafka-handler#query-table

You can use it to atomically move data in and out Kafka it self. https://github.com/apache/hive/tree/master/kafka-handler#query-table

Query the Kafka Stream as part of the entire Data warehouse like ORC/Parquet tables, Druid Tables, HDFS, S3… etc.

KSQL ( Confluent ) VS Hive Kafka SQL ( Hortanworks )

1 Answers1

Installation

EcoSystem

Interface