I am a newbie to Hadoop / Hive and I have just started reading the docs. There are lots of blogs on installing Hadoop in cluster mode. Also, I know that Hive runs on top of Hadoop. My question is: Hadoop is installed on all the cluster nodes. Should I also install Hive on all the cluster nodes or only on the master node?
-
this post explains with image [Hive and MR relation](https://stackoverflow.com/q/40510851/1592191) – mrsrinivas Nov 14 '17 at 12:16
3 Answers
No, it is not something you install on worker nodes. Hive is a Hadoop client. Just run Hive according to the instructions you see at the Hive site.

- 66,182
- 23
- 141
- 173
-
1
-
5To add to Sean - Hive converts HiveQL into a MR job on the client side and the Hadoop framework wouldn't be aware of Hive. Same is the case with Pig/Pig Latin also. – Praveen Sripati Dec 10 '11 at 16:47
-
Thanks. Installed hive in a slave machine of my yarn cluster and the queries are successfully converted into MR jobs. – prabhugs Feb 05 '16 at 10:33
From Cloudera's Hive installation Guide:
Install Hive on your client machine(s) from which you submit jobs; you do not need to install it on the nodes in your Hadoop cluster.

- 14,289
- 18
- 86
- 145
Hive is basically used for processing structured and semi-structured data in Hadoop. We can also perform Analysis of large datasets which is present in HDFS and also in Amazon S3 filesystem using Hive. In order to query data hive also provides query language known as HiveQL which is similar to SQL. Using Hive one can easily run Ad-hoc queries for the data analysis. Using Hive we don’t need to write complex Map-Reduce jobs, we just need to submit SQL queries. Hive converts these SQL queries into MapReduce jobs.
Finally Hive SQL will get converted to MapReduce jobs and we don't have to submit MapReduce job from all node in a Hadoop cluster, in the same way we don't need Hive to be installed in all node of Hadoop cluster

- 399
- 4
- 8
-
As you mentioned hive converts the query to MapReduce job, then how does the MapReduce jobs gets run without hadoop cluster? Say, I have hive on S3, I run query on Hive, which gets converted to MapReduce, If I don't have a running cluster, how does this work? – Arun May 01 '18 at 16:32
-
@Arun - It doesn't. You can't "have hive on S3". You can use S3 as a filesystem for hadoop, but that hadoop cluster that is using S3 has to be an actual cluster of compute nodes with hadoop installed. – mhaken May 03 '18 at 18:04