0

I've installed Hadoop 3.2.1 in my Ubuntu 20.04 on Virtualbox for my college study and college's deadline so I'm new in Hadoop. And I've searching several source in internet how to mapreduce on Hadoop.

But, when I type this on terminal:

hadoop jar '/home/tamminen/WordCountTutorial/firstTutorial.jar' WordCount /WordCountTutorial/Input /WordCountTutorial/Output

in format :

hadoop jar <JAR_FILE> <CLASS_NAME> <HDFS_INPUT_DIRECTORY> <HDFS_OUTPUT_DIRECTORY>

The command appear like this :

2020-10-11 18:59:04,584 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:05,595 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:06,598 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:07,618 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:08,619 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:09,621 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:10,624 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:11,625 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:12,627 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:13,629 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-10-11 18:59:13,632 INFO retry.RetryInvocationHandler: java.net.ConnectException: Call From 18k10018-data-mining/10.0.2.15 to localhost:8032 failed on connection exception: java.net.ConnectException: Connection refused; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 3 failover attempts. Trying to failover after sleeping for 34444ms.

Which it lead me into cannot do hadoop dfs -cat <HDFS_OUTPUT_DIRECTORY>*

And this is my hadoop configuration file that i've change like this :

core-site.xml

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
        <name>hadoop.proxyuser.dataflair.groups</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.dataflair.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.server.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.server.groups</name>
    <value>*</value>
    </property>
</configuration> 

hdfs-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapred.child.java.opts</name>
        <value>-Xmx4096m</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.resourcemanager.address</name>
        <value>127.0.0.1:8032</value>
    </property>
    <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>127.0.0.1:8030</value>
    </property>
    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>127.0.0.1:8032</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
    </property>
</configuration>

and then hadoop-env.sh

...
# Extra Java runtime options for all Hadoop commands. We don't support
# IPv6 yet/still, so by default the preference is set to IPv4.
# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true"
export HADOOP_OPTS="-Xmx5096m"  --> Only this I added from searching hadoop tutorial solution beside of JAVA_HOME
# For Kerberos debugging, an extended option set logs more information
# export HADOOP_OPTS="-Djava.net.preferIPv4Stack=true -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug"
...

Can anyone explain why this is error and give me solution what should I do to do hadoop jar?

pup_in_the_tree
  • 169
  • 1
  • 1
  • 8
  • If you run `jps`, is YARN running? – OneCricketeer Oct 12 '20 at 15:05
  • i think no, only 3206 DataNode, 15415 jps, 3468 SecondaryNameNode, and 3055 NameNode – pup_in_the_tree Oct 12 '20 at 15:30
  • That would explain the error connecting to YARN, then. I assume you ran `start-yarn`? – OneCricketeer Oct 12 '20 at 17:33
  • First, I didn't run start-yarn, then I tried to ran start-yarn after you suggest me so when I run jps, the terminal appear like this : 29025 NodeManager 3206 DataNode 28824 ResourceManager 29192 Jps 3468 SecondaryNameNode 3055 NameNode. But when i do hadoop jar command, the command still appear like this : java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused, while invoking ApplicationClientProtocolPBClientImpl.getNewApplication over null after 5 failover attempts. Trying to failover after sleeping for 42681ms. – pup_in_the_tree Oct 13 '20 at 01:24
  • Have you read the suggestions in that link? – OneCricketeer Oct 13 '20 at 14:14
  • yes, but i'm still confused of the link's suggestions – pup_in_the_tree Oct 13 '20 at 15:23
  • Well, `connect to server: localhost/127.0.0.1:8032` would say its trying to connect to YARN. You can look at the YARN log files to see if it actually started successfully before running your app – OneCricketeer Oct 13 '20 at 15:46
  • I put hadoop files not in directory /, but in /home/Downloads/hadoop-3.2.1 . Where i can find yarn log files? so i can see it and find the problems – pup_in_the_tree Oct 13 '20 at 16:49
  • There should be a logs folder that gets created there. – OneCricketeer Oct 13 '20 at 17:58
  • So, i shouldn't put hadoop-3.2.1 on Downloads? and I should move it to /usr/local/hadoop? – pup_in_the_tree Oct 14 '20 at 12:47
  • It can be located anywhere. You shouldn't put it in your user folder, no, but that's not what I said. There's log files that get created somewhere – OneCricketeer Oct 14 '20 at 14:46
  • You mean /home/Downloads/hadoop-3.2.1/logs? There are many files which format name like this : hadoop-(username)-(datanode/namenode/nodemanager/resourcemanager/secondarynamenode)-(device name).(log/out/out.1 and etc) Example file name : hadoop-tamminen-resourcemanager-18k10018-data-mining.log, then there are file SecurityAuth-tamminen.audit, and userlogs folder which it's empty. If it's right that what you mean, what should I do? – pup_in_the_tree Oct 14 '20 at 15:28
  • As mentioned yarn isn't starting or being connected to, so it's the resourcemanager or node manager files that typically contain some error – OneCricketeer Oct 14 '20 at 15:36
  • I see my files at a glance (because this file is so long), hadoop-tamminen-resourcemamager-18k10018-data-mining.log, it seems there are error like this : java.net.BindException: Problem binding to [localhost:8032]. And in hadoop-tamminen-nodemamager-18k10018-data-mining.log it seems there are error like this : ERROR org.apache.hadoop.yarn.server.nodemanager.NodeManager : Error starting NodeManager and error like this : Call from 10.0.2.15/10.0.2.15 to localhost:8032 failed on connection. So the error is in localhost:8032? (Is it the log file do you mean?) – pup_in_the_tree Oct 14 '20 at 16:17

1 Answers1

0

This may happen because sometime Hadoop starts some services on internal IP address of server instead of localhost or 127.0.0.1. You can try changing 127.0.0.1 to actual IP address of your server in all Hadoop config files and see if it works. Other way around is to edit /etc/hosts file as root and map localhost to actual ip of your server.

For more precise instructions follow below article, https://hadooptutorials.info/2020/10/05/part-1-apache-hadoop-installation-on-single-node-cluster-with-google-cloud-virtual-machine/