3

I recently came across Apache Kylin, and was curious what it's use cases are. From what I can tell, it seems to be a tool designed to solve very specific problems related to upwards of 10+ billion rows, aggregating, caching and querying data from other sources (HBase, Hadoop, Hive). Am I correct in this assumption?

unseen_damage
  • 1,346
  • 1
  • 14
  • 32
  • Look at : http://www.ebaytechblog.com/2014/10/20/announcing-kylin-extreme-olap-engine-for-big-data/ – Ravindra babu Mar 07 '16 at 15:57
  • That doesn't really give generic use cases, rather explains how Ebay used it (as ebay developed the tool in the first place), and architecture of the product. – unseen_damage Mar 07 '16 at 18:18

1 Answers1

1

Apache Kylin's use case is interactive big data analysis on Hadoop. It lets you query big Hive tables at sub-second latency in 3 simple steps.

  1. Identify a set of Hive tables in star schema.
  2. Build a cube from the Hive tables in an offline batch process.
  3. Query the Hive tables using SQL and get results in sub-seconds, via Rest API, ODBC, or JDBC.

The use case is pretty general that it can fast query any Hive tables as long as you can define star schema and model cubes from the tables. Check out Kylin terminologies if you are not sure what is star schema and what is cube.

Kylin provides ANSI SQL interface, so you can query the Hive tables pretty much the same way you used to. One limitation however is Kylin provides only aggregated results, or in other word, SQL should contain a "group by" clause to yield correct result. This is usually fine because big data analysis focus more on the aggregated results rather than individual records.

Li Yang
  • 284
  • 1
  • 6
  • Would the following use case be valid? a) Design a star schema using HiveQL b) Load data into Hadoop using a Hive connector c) Kylin would handle mapping the Hive schema to a cube schema and executing the Map/Reduce through HiveQL d) Output the results of the query to HBase e) Use Kylin for executing SQL statements via Calcite to HBase and returning the results as JSON. – unseen_damage Mar 16 '16 at 15:12
  • Very close! Just note step c) is manual, the mapping from Hive schema to cube schema is manual, Kylin provides a GUI for you to do it. Also step d) the output is not specific to any query, but a general index of the Hive data, we call the index "cube". – Li Yang Mar 19 '16 at 00:06
  • @LiYang, so the Kylin is often used by the Data Analyst to interactive query the data? Can we integrate Kylin query to a web application? (e.g. using kylin to provide data api to a report server?) – mingchau Apr 02 '19 at 02:59
  • Yes, Kylin can serve web application too, via JDBC / ODBC / Rest API. – Li Yang Apr 05 '19 at 10:51