In a hadoop cluster, should hive be installed on all nodes?

Question

I am a newbie to Hadoop / Hive and I have just started reading the docs. There are lots of blogs on installing Hadoop in cluster mode. Also, I know that Hive runs on top of Hadoop. My question is: Hadoop is installed on all the cluster nodes. Should I also install Hive on all the cluster nodes or only on the master node?

this post explains with image [Hive and MR relation](https://stackoverflow.com/q/40510851/1592191) — mrsrinivas, Nov 14 '17 at 12:16

score 36 · Accepted Answer · answered Dec 10 '11 at 11:27

36

No, it is not something you install on worker nodes. Hive is a Hadoop client. Just run Hive according to the instructions you see at the Hive site.

answered Dec 10 '11 at 11:27

Sean Owen

66,182
23
141
173

1

Thanks Sean for the quick reply. It helped me clear my doubt. – Vijay Dec 10 '11 at 13:45
5

To add to Sean - Hive converts HiveQL into a MR job on the client side and the Hadoop framework wouldn't be aware of Hive. Same is the case with Pig/Pig Latin also. – Praveen Sripati Dec 10 '11 at 16:47
Thanks. Installed hive in a slave machine of my yarn cluster and the queries are successfully converted into MR jobs. – prabhugs Feb 05 '16 at 10:33

score 3 · Answer 2 · edited Aug 28 '19 at 16:14

3

From Cloudera's Hive installation Guide:

Install Hive on your client machine(s) from which you submit jobs; you do not need to install it on the nodes in your Hadoop cluster.

edited Aug 28 '19 at 16:14

Tiago Martins Peres

14,289
18
86
145

answered Mar 13 '15 at 10:56

score 0 · Answer 3 · answered Oct 26 '16 at 06:57

0

Hive is basically used for processing structured and semi-structured data in Hadoop. We can also perform Analysis of large datasets which is present in HDFS and also in Amazon S3 filesystem using Hive. In order to query data hive also provides query language known as HiveQL which is similar to SQL. Using Hive one can easily run Ad-hoc queries for the data analysis. Using Hive we don’t need to write complex Map-Reduce jobs, we just need to submit SQL queries. Hive converts these SQL queries into MapReduce jobs.

Finally Hive SQL will get converted to MapReduce jobs and we don't have to submit MapReduce job from all node in a Hadoop cluster, in the same way we don't need Hive to be installed in all node of Hadoop cluster

answered Oct 26 '16 at 06:57

Vikas Singh

399
4
8

As you mentioned hive converts the query to MapReduce job, then how does the MapReduce jobs gets run without hadoop cluster? Say, I have hive on S3, I run query on Hive, which gets converted to MapReduce, If I don't have a running cluster, how does this work? – Arun May 01 '18 at 16:32
@Arun - It doesn't. You can't "have hive on S3". You can use S3 as a filesystem for hadoop, but that hadoop cluster that is using S3 has to be an actual cluster of compute nodes with hadoop installed. – mhaken May 03 '18 at 18:04

In a hadoop cluster, should hive be installed on all nodes?

3 Answers3

Linked