What are the difference ? which one is better ? when to use ?
1 Answers
Installation
KSQL uses Kafka Streams, and does not depend on Hive, only Kafka and Zookeeper
Hive-Kakfa requires both Kafka, a HiveServer, and a RDBMS (MySQL, Postgres, etc).
EcoSystem
For external integrations, Hive-Kafka does not offer Confluent Avro Schema Registry integration. It might (eventually?) offer Hortonworks Schema Registry integration, though.
Hortonwork's suite of tools around NiFi, Spark, Kafka, SMM, Atlas, Ranger, Hive-Streaming, etc. are probably all well tested together.
Confluent partners with other companies to ensure proper integrations are met with such other tools than Kafka and their Platform.
Interface
AFAIK, Hive-Kafka is only a query engine, it will not create/maintain KStreams/KTable instances like KSQL, and will always require a scan of the Kafka topic. It also has no native REST interface for submitting queries, so the only option for external access would be JDBC/ODBC.
For a UI, Hive works well with HUE or Ambari Views which are both open-source but KSQL primarily only has Confluent Control Center, which is a paid-for solution.
"Better" is an opinion, and but if you already have Hive, I see no reason not to use Hive-Kafka.
IMO, KSQL can compliment Hive-Kafka by defining new topics as both tables and streams, as well as transforming/filtering Confleunt's Avro format into JSON that Hive-Kafka can natively understand. From there you can join existing Hive data (HDFS, S3, HBase, etc) with Hive-Kafka data, though, there will likely be performance impacts of that.
Similarly, you can take Hive-Kafka topics and translate them into Avro in KSQL using the Schema Registry, to use with other tools like Kafka Connect or NiFi to have a more efficient wire format (binary-avro vs. json).
And FWIW, look at the comments section of your first link
This integration is very different from KSQL.
- The primary use case here is to allow users to actually unleash full SQL query use cases against any Kafka topic. https://github.com/apache/hive/tree/master/kafka-handler#query-table
- You can use it to atomically move data in and out Kafka it self. https://github.com/apache/hive/tree/master/kafka-handler#query-table
- Query the Kafka Stream as part of the entire Data warehouse like ORC/Parquet tables, Druid Tables, HDFS, S3… etc.

- 179,855
- 19
- 132
- 245