How to configure Pivotal Hadoop

Question

We are working on a Greenplum with HAWQ installed. I would like to run a hadoop-streaming job. However, it seems that hadoop is not configured or started. How can i start mapred to make sure that i can use hadoop-streaming?

score 0 · Answer 1 · answered Oct 21 '14 at 08:57

Try the below command to get word count:

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
-input <inputDir> \
-output <outputDir> \
-mapper /bin/cat \
-reducer /bin/wc

If that gives you correct word count then its working else check the error that's spit out by running this command

score 0 · Answer 2 · answered Nov 01 '14 at 18:18

First, make sure that cluster is started and is working. To make it go to the Pivotal Command Center (usually the link is like this: https://<admin_node>:5443/ ) and see the cluster status or ask your administrator to do so.

Next, make sure that you have the PHD client libraries installed on the machine you are trying to start your job. Run "rpm -qa | grep phd"

Next, if the cluster is running and libraries are installed, you can run the streaming job like this:

hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-streaming.jar -mapper /bin/cat -reducer /bin/wc -input /example.txt -output /testout

/example.txt file should exist on HDFS

score -1 · Answer 3 · answered Dec 04 '14 at 07:32

I do it long back, Greenplum/Pivotal Hadoop

--1. For Instatllation icm_client deploy ex. - icm_client deploy HIVE

--2. For status HDFS Service hadoop-namenode status Service hadoop-datanode status Service hadoop-secondarynamenode status MapRed Service hadoop-jobtracker status Service hadoop-tasktracker status Hive service hive-server status service hive-metastore status

--3. For start/stop/restart service hive-server start service hive-server stop service hive-server restart

Note: You will find all this command and details in installation guide, may be available online somewhere hadoop installation guide

Thanks,

How to configure Pivotal Hadoop

3 Answers3