3

There are multiple tools built to access data from Hadoop.

Very popular amongst them are Hive and Impala. While Impala was built to address batch nature of Hive (for low cost SQLs), Impala cannot eliminate MapReduce completely as its really great a framework for dealing with batch data.

For low cost SQLs Impala gives dramatically great performance as it skips MapReduce jobs.

What exactly causes Impala to be faster than Hive? Is it in memory execution? Or is efficient and intelligent usage of existing hardware (named nodes and data nodes)?

funsuk
  • 71
  • 2
  • 6
  • Impala has a book that explains this in depth: https://www.safaribooksonline.com/library/view/learning-cloudera-impala/9781783281275/ch07s04.html Here's another article as well: https://www.linkedin.com/pulse/20140910142911-22744472-why-is-impala-faster-than-hive – TayTay Oct 30 '15 at 17:59
  • 3
    This is a good technical overview, published by IBM researchers: http://www.vldb.org/pvldb/vol7/p1295-floratou.pdf – Matt Oct 30 '15 at 19:29

0 Answers0