-3

What are the difference ? which one is better ? when to use ?

Hive Kafka SQL

KSQL

sharan jain
  • 85
  • 1
  • 11

1 Answers1

1

Installation

KSQL uses Kafka Streams, and does not depend on Hive, only Kafka and Zookeeper

Hive-Kakfa requires both Kafka, a HiveServer, and a RDBMS (MySQL, Postgres, etc).

EcoSystem

For external integrations, Hive-Kafka does not offer Confluent Avro Schema Registry integration. It might (eventually?) offer Hortonworks Schema Registry integration, though.

Hortonwork's suite of tools around NiFi, Spark, Kafka, SMM, Atlas, Ranger, Hive-Streaming, etc. are probably all well tested together.

Confluent partners with other companies to ensure proper integrations are met with such other tools than Kafka and their Platform.

Interface

AFAIK, Hive-Kafka is only a query engine, it will not create/maintain KStreams/KTable instances like KSQL, and will always require a scan of the Kafka topic. It also has no native REST interface for submitting queries, so the only option for external access would be JDBC/ODBC.

For a UI, Hive works well with HUE or Ambari Views which are both open-source but KSQL primarily only has Confluent Control Center, which is a paid-for solution.



"Better" is an opinion, and but if you already have Hive, I see no reason not to use Hive-Kafka.

IMO, KSQL can compliment Hive-Kafka by defining new topics as both tables and streams, as well as transforming/filtering Confleunt's Avro format into JSON that Hive-Kafka can natively understand. From there you can join existing Hive data (HDFS, S3, HBase, etc) with Hive-Kafka data, though, there will likely be performance impacts of that.

Similarly, you can take Hive-Kafka topics and translate them into Avro in KSQL using the Schema Registry, to use with other tools like Kafka Connect or NiFi to have a more efficient wire format (binary-avro vs. json).


And FWIW, look at the comments section of your first link

This integration is very different from KSQL.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245