Two of my three slave VMs are down and I can't ssh them. We have performed a hard reboot but still they are down. Any idea how to bring them back or how to debug to find the reason. Here's what jps
:
3542 RunJar
9920 SecondaryNameNode
10094 ResourceManager
10244 NodeManager
8677 DataNode
31634 Jps
8536 NameNode
Here's also another detail:
ubuntu@anmol-vm1-new:~$ sudo netstat -atnp | grep 8020
tcp 0 0 10.0.1.190:8020 0.0.0.0:* LISTEN 8536/java
tcp 0 0 10.0.1.190:50957 10.0.1.190:8020 ESTABLISHED 8677/java
tcp 0 0 10.0.1.190:8020 10.0.1.190:50957 ESTABLISHED 8536/java
tcp 0 0 10.0.1.190:8020 10.0.1.193:46627 ESTABLISHED 8536/java
tcp 0 0 10.0.1.190:44300 10.0.1.190:8020 TIME_WAIT -
tcp 0 0 10.0.1.190:8020 10.0.1.190:44328 ESTABLISHED 8536/java
tcp 0 0 10.0.1.190:8020 10.0.1.193:44610 ESTABLISHED 8536/java
tcp6 0 0 10.0.1.190:44292 10.0.1.190:8020 TIME_WAIT -
tcp6 0 0 10.0.1.190:44328 10.0.1.190:8020 ESTABLISHED 10244/java
tcp6 0 0 10.0.1.190:44252 10.0.1.190:8020 TIME_WAIT -
tcp6 0 0 10.0.1.190:44247 10.0.1.190:8020 TIME_WAIT -
tcp6 0 0 10.0.1.190:44287 10.0.1.190:8020 TIME_WAIT -
When I run the following command:
hadoop fsck /
the result is:
The filesystem under path '/' is CORRUPT
Here's more details in this pastebin.