8

How do I troubleshoot and recover a Lost Node in my long running EMR cluster?

The node stopped reporting a few days ago. The host seems to be fine and HDFS too. I noticed the issue only from the Hadoop Applications UI.

Marsellus Wallace
  • 17,991
  • 25
  • 90
  • 154

1 Answers1

1

EMR nodes are ephemeral and you cannot recover them once they are marked as LOST. You can avoid this in first place by enabling 'Termination Protection' feature during a cluster launch.

Regarding finding reason for LOST node, you can probably check YARN ResourceManager logs and/or Instance controller logs of your cluster to find out more about root cause.

annunarcist
  • 1,637
  • 3
  • 20
  • 42