How exactly Impala is faster than hive?

Asked Oct 30 '15 at 17:53

Active Oct 30 '15 at 17:59

Viewed 198 times

There are multiple tools built to access data from Hadoop.

Very popular amongst them are Hive and Impala. While Impala was built to address batch nature of Hive (for low cost SQLs), Impala cannot eliminate MapReduce completely as its really great a framework for dealing with batch data.

For low cost SQLs Impala gives dramatically great performance as it skips MapReduce jobs.

What exactly causes Impala to be faster than Hive? Is it in memory execution? Or is efficient and intelligent usage of existing hardware (named nodes and data nodes)?

edited Oct 30 '15 at 17:59

asked Oct 30 '15 at 17:53

funsuk

Impala has a book that explains this in depth: https://www.safaribooksonline.com/library/view/learning-cloudera-impala/9781783281275/ch07s04.html Here's another article as well: https://www.linkedin.com/pulse/20140910142911-22744472-why-is-impala-faster-than-hive – TayTay Oct 30 '15 at 17:59
3

This is a good technical overview, published by IBM researchers: http://www.vldb.org/pvldb/vol7/p1295-floratou.pdf – Matt Oct 30 '15 at 19:29

How exactly Impala is faster than hive?

0 Answers0