4

I am running a Hadoop job (using Hadoop 0.20.2) on a 6 machine setup; one machine is the namenode / secondary node / job tracker (master) and the other 5 machines are all datanodes / tasktrackers (slaves). The job has over 14,000 maps and it is more than 10% complete. When I browse the job tracker Job details page I see this:

Status: Running
Started at: Tue Jul 05 18:12:44 PDT 2011
Running for: 66hrs, 5mins, 4sec
Job Cleanup: Pending
Black-listed TaskTrackers: 1

I log in to the machine in question and I can see that the task tracker process is running, but the machine is not doing any work (the top command shows me that CPU utilization is < 10%). I have already restarted the task tracker node with these commands

./hadoop-daemon.sh  stop tasktracker
./hadoop-daemon.sh  start tasktracker

but the node is still in the blacklist, and task tracker is running, but the machine is still not performing any work.

Question: Is there any way to tell Hadoop to "un" blacklist it and send tasks to the node? Hopefully without having to restart the job?

PS. The node was confirmed to be running and performing tasks at the start of the job.

RobertoP
  • 143
  • 1
  • 3

1 Answers1

2

Put following config in conf/hdfs-site.xml:

<property>
  <name>dfs.hosts</name>
  <value>/full/path/to/whitelisted/node/file</value>
</property>

Use following command to ask Hadoop to refresh node status to based on configuration.

./bin/hadoop dfsadmin -refreshNodes
David Mathis
  • 898
  • 2
  • 9
  • 21
  • 1
    Thanks; this worked. I followed this, then I stopped and restarted the task tracker on the slave and the slave got back to work. – RobertoP Jul 13 '11 at 01:58
  • Any idea if this should work if I put the setting in hdfds-site.xml while the job is running? I tried this, refreshed, then restarted task tracker on slave, but it is still not doing work, and shows in web interface as blocked. – Dolan Antenucci Jun 13 '12 at 01:40
  • 1
    I just noticed that there are two separate settings for defining hosts (dfs.hosts and mapred.hosts). I never figured out how to restore blacklisted tasktrackers for a job, but I was trying "dfs.hosts", which in hindsight, doesn't make sense (I should have tried mapred.hosts and then the restart). If Roberto or David have insight, please share :) – Dolan Antenucci Jun 19 '12 at 00:52