4

I'm trying to run a word count program using Hadoop 3.2.1 on my Ubuntu 20.04 Virtual Machine. But I've been getting a "resource-types.xml" not found error and although it shows that the job is running it does not give any output.

mapred-site.xml

<property> 
  <name>mapreduce.framework.name</name> 
  <value>yarn</value> 
</property>
<property>
 <name>yarn.app.mapreduce.am.env</name>
 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
 <name>mapreduce.map.env</name>
 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
 <name>mapreduce.reduce.env</name>
 <value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property> 

yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
</property>
<property>
  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
  <value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
  <name>yarn.resourcemanager.hostname</name>
  <value>127.0.0.1</value>
</property>
<property>
  <name>yarn.acl.enable</name>
  <value>0</value>
</property>
<property>
  <name>yarn.nodemanager.env-whitelist</name>   
  <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PERPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>

</configuration>

core-site.xml

<configuration>
<property>
  <name>hadoop.tmp.dir</name>
  <value>/home/hdoop/tmpdata</value>
</property>
<property>
  <name>fs.default.name</name>
  <value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>

hdfs-site.xml

<configuration>
<property>
  <name>dfs.namenode.name.dir</name>
  <value>/home/richa/hadoop/data/namenode</value>
</property>
<property>
  <name>dfs.datanode.data.dir</name>
  <value>/home/richa/hadoop/data/datanode</value>
</property>
<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>
</configuration>

When i try to run my hadoop jar command, I get the following:

richa@richa-VirtualBox:~$ hadoop jar /home/richa/wc.jar WordCount /home/richa/input/wc_input.txt /home/richa/output
2020-08-31 08:37:38,144 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:8032
2020-08-31 08:37:39,986 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2020-08-31 08:37:40,049 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/richa/.staging/job_1598842809949_0002
2020-08-31 08:37:40,419 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:40,862 INFO input.FileInputFormat: Total input files to process : 1
2020-08-31 08:37:41,020 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:41,512 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:41,573 INFO mapreduce.JobSubmitter: number of splits:1
2020-08-31 08:37:42,344 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-08-31 08:37:42,506 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1598842809949_0002
2020-08-31 08:37:42,507 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-08-31 08:37:43,130 INFO conf.Configuration: resource-types.xml not found
2020-08-31 08:37:43,131 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-08-31 08:37:43,351 INFO impl.YarnClientImpl: Submitted application application_1598842809949_0002
2020-08-31 08:37:43,462 INFO mapreduce.Job: The url to track the job: http://richa-VirtualBox:8088/proxy/application_1598842809949_0002/
2020-08-31 08:37:43,464 INFO mapreduce.Job: Running job: job_1598842809949_0002

I am unable to understand, where am i going from? I have included almost all of the jar files. Did i miss something in my mapred-site.xml? Or should i wait for a longer time for it to complete it's job? How much time does it even take to run a small program? All my environment variables are also correct.

Thank you in advance!

Richa
  • 41
  • 1
  • 4

1 Answers1

0

I spent 2 hours searching for an answer. Finally I asked chatGPT (1) to give me a sample resource-types.xml" file. (see below) Then I asked it what directory that file belonged in, and it told me "in the "etc/hadoop" directory of your Hadoop installation". I made such a file, put it there, and bingo. My hadoop job ran.

<?xml version="1.0"?>
<configuration>
  <resources>
    <resourceType name="GPU" units="NONE">
      <schedulerInclude>true</schedulerInclude>
      <yarnInclude>true</yarnInclude>
    </resourceType>
    <resourceType name="FPGA" units="NONE">
      <schedulerInclude>true</schedulerInclude>
      <yarnInclude>true</yarnInclude>
    </resourceType>
  </resources>
</configuration>
kurtfriedrich
  • 69
  • 1
  • 7