1

I am trying to submit a giraph job to a hadoop 1.2.1 cluster. The cluster has a name node master, a map reduce master, and four slaves. The job is failing with the following exception:

java.util.concurrent.ExecutionException: java.lang.IllegalStateException: checkLocalJobRunnerConfiguration: When using LocalJobRunner, must have only one worker since only 1 task at a time!

However, here is my mapred-site.xml file:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>job.tracker.private.ip:9001</value>
     </property>
     <property>
         <name>mapreduce.job.counters.limit</name>
         <value>1000</value>
     </property>
     <property>
         <name>mapred.tasktracker.map.tasks.maximum</name>
         <value>50</value>
     </property>
     <property>
         <name>mapred.tasktracker.reduce.tasks.maximum</name>
         <value>50</value>
     </property>
</configuration>

and my core-site.xml file:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://name.node.private.ip:9000</value>
     </property>
</configuration>

Additionally my job tracker's master file contains its private ip and the slaves file contains the private ips of the four slaves. The name node's master file contains its private ip and the slaves file contains the private ips of the four slaves.

I thought that setting the mapred.job.tracker field to the ip of the map reduce master would make hadoop boot with a remote job runner but apparently not - how can I fix this?

cscan
  • 3,684
  • 9
  • 45
  • 83
  • 1
    Are you really still running Hadoop 1? – OneCricketeer Apr 14 '17 at 00:41
  • Yes, it is required for OLAP operations with Titan, a graph database. – cscan Apr 14 '17 at 00:46
  • Only Tinkerpop needs Hadoop 1. https://github.com/thinkaurelius/titan/wiki/Downloads – OneCricketeer Apr 14 '17 at 03:09
  • @cscan What value do you have for `fs.default.name`? Can you post `core-site.xml`? – franklinsijo Apr 14 '17 at 05:03
  • @franklinsijo I've updated the question. – cscan Apr 14 '17 at 16:17
  • @cricket_007 Titan uses gremlin for OLAP operations. – cscan Apr 14 '17 at 16:17
  • I know that... I was reading that Titan development has effectively halted when the people behind it were acquired by DataStax, and now using DSE Graph would be the way to go. Or move to OrientDB if you want a free, scalable graph database. – OneCricketeer Apr 14 '17 at 16:49
  • @cricket_007 Yeah, it kinda sucks for companies like us that started with Titan prior to the acquisition especially as DSE graph doesn't actually have an implementation of the tinkerpop API... https://datastax-oss.atlassian.net/browse/JAVA-1250 – cscan Apr 17 '17 at 16:07
  • @franklinsijo I want to split the master and worker tasks as I am running this on a cluster. I did however try it and received the same error message. – cscan Apr 17 '17 at 18:07
  • Unfortunately I can't really help you with your Hadoop problem, but regarding the problem of being stuck with Titan: Titan was forked and now lives on as [JanusGraph](http://janusgraph.org/). The first official release of JanusGraph is actually [expected for today](https://groups.google.com/forum/#!topic/janusgraph-dev/zYST1cGGTW0). Here you can read more about the beginning of JanusGraph: https://www.datanami.com/2017/01/13/janusgraph-picks-titandb-left-off/ – Florian Hockmann Apr 19 '17 at 06:51
  • @FlorianHockmann This is excellent news, thank you. – cscan Apr 19 '17 at 17:50

1 Answers1

0

The problem wasn't that hadoop was running in local job mode, the problem is that giraph, configured on another machine, assumed that hadoop was running in local job mode.

I was submitting the job via gremlin, I needed to add the following line to the its configuration file:

mapred.job.tracker=job.tracker.private.ip:9001
cscan
  • 3,684
  • 9
  • 45
  • 83