1

I have a hadoop cluster with ~7 machines, and some of the machines were keep going down. Sometimes, the hadoop datanode / jobtracker processes only dies (the machine is still running), and other times, the entire machine goes down.

I haven't really debugged situation like this, so I'm wondering where should I start - like logs that I should look into. log file under /logs/ directory - files like hadoop-dev-datanode-X.log doesn't seem to have anything useful. also, if the Linux machine goes down, where should I look for the error messages?

Jeeyoung Kim
  • 229
  • 2
  • 8

0 Answers0