0

I am new to ceph. I have a 5 node cluster (ubuntu 14.04) where I have setup Hadoop (1.1.1) and ceph (v0.87). I want to use Hadoop with cephFS and run some experiments. I ran the wordcount example with normal hadoop setting and it works fine. The ceph cluster health is also OK. But when I change the Hadoop configuration as mentioned in the “Using Hadoop with CephFS” documentation http://ceph.com/docs/master/cephfs/hadoop/, I am facing the following error (I have mounted cephfs with kernel driver in /mnt/mycephfs):

ceph@admin-node:/usr/local/hadoop-1.1.1$ bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /mnt/mycephfs/wc-input /mnt/mycephfs/wc-output-425

15/04/14 20:47:00 INFO util.NativeCodeLoader: Loaded the native-hadoop library
15/04/14 20:47:00 INFO input.FileInputFormat: Total input paths to process : 1
15/04/14 20:47:00 WARN snappy.LoadSnappy: Snappy native library not loaded
15/04/14 20:47:01 INFO mapred.JobClient: Running job: job_201504142046_0001
15/04/14 20:47:02 INFO mapred.JobClient:  map 0% reduce 0%
15/04/14 20:47:03 INFO mapred.JobClient: Task Id : attempt_201504142046_0001_m_000021_0, Status : FAILED
Error initializing attempt_201504142046_0001_m_000021_0:
java.io.FileNotFoundException: File file:/app/hadoop/tmp/mapred/system/job_201504142046_0001/jobToken does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
        at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4445)
        at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272)
        at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
        at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
        at java.lang.Thread.run(Thread.java:745)

15/04/14 20:47:03 WARN mapred.JobClient: Error reading task outputhttp://node2:50060/tasklog?plaintext=true&attemptid=attempt_201504142046_0001_m_000021_0&filter=stdout
15/04/14 20:47:03 WARN mapred.JobClient: Error reading task outputhttp://node2:50060/tasklog?plaintext=true&attemptid=attempt_201504142046_0001_m_000021_0&filter=stderr
15/04/14 20:47:03 INFO mapred.JobClient: Task Id : attempt_201504142046_0001_r_000002_0, Status : FAILED
Error initializing attempt_201504142046_0001_r_000002_0:
java.io.FileNotFoundException: File file:/app/hadoop/tmp/mapred/system/job_201504142046_0001/jobToken does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
        at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4445)
        at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272)
        at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
        at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
        at java.lang.Thread.run(Thread.java:745) 

15/04/14 20:47:03 WARN mapred.JobClient: Error reading task outputhttp://node3:50060/tasklog?plaintext=true&attemptid=attempt_201504142046_0001_m_000021_1&filter=stdout
15/04/14 20:47:03 WARN mapred.JobClient: Error reading task outputhttp://node3:50060/tasklog?plaintext=true&attemptid=attempt_201504142046_0001_m_000021_1&filter=stderr
15/04/14 20:47:04 INFO mapred.JobClient: Task Id : attempt_201504142046_0001_r_000002_1, Status : FAILED
Error initializing attempt_201504142046_0001_r_000002_1:
java.io.FileNotFoundException: File file:/app/hadoop/tmp/mapred/system/job_201504142046_0001/jobToken does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:397)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:251)
        at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:4445)
        at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1272)
        at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1213)
        at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2568)
        at java.lang.Thread.run(Thread.java:745)
.....................

Using cephFS instead of HDFS requires only the mapred daemons so only the jobtracker and tasktrackers are running in the nodes(1 jobtracker, 4 tasktrackers) . My core-site.xml file of Hadoop: (removing the hadoop.tmp.dir as already suggested in another question does not solve the problem)

<configuration>


<property>
<name>fs.defaultFS</name>
<value>ceph://10.242.144.225:6789/</value>
</property>

<property>
<name>ceph.root.dir</name>
<value>/mnt/mycephfs</value>
</property>

<property>
<name>ceph.conf.file</name>
<value>/etc/ceph/ceph.conf</value>
</property>

<property>
<name>ceph.data.pools</name>
<value>data</value>
</property>

<property>
<name>fs.AbstractFileSystem.ceph.impl</name>
<value>org.apache.hadoop.fs.ceph.CephFs</value>
</property>

<property>
<name>fs.ceph.impl</name>
<value>org.apache.hadoop.fs.ceph.CephFileSystem</value>
</property>

</configuration>

mapred-site.xml is :

<configuration>
<property>
  <name>mapred.job.tracker</name>
  <value>10.242.144.212:54311</value>
  <description>The host and port that the MapReduce job tracker runs
  at.  Provide the ip address of your master node. The port number must be 54311 or 8021.
  </description>
</property>

<property>
<name>fs.defaultFS</name>
<value>ceph://10.242.144.225:6789/</value>
</property>

</configuration>

Please let me know where I am making the mistake. Any help in this regard is truly appreciated.

  • 1
    possible duplicate of [error running Hadoop wordcount example](http://stackoverflow.com/questions/10303169/error-running-hadoop-wordcount-example) – Tristan Foureur Apr 14 '15 at 21:26
  • removing the hadoop.tmp.dir from core-site.xml as suggested in [link](http://stackoverflow.com/questions/10303169/error-running-hadoop-wordcount-example) does not solve the problem – default_user Apr 15 '15 at 21:57

0 Answers0