I have a common problem where I start an AWS EMR Cluster and log in via SSH and then run spark-shell
to test some Spark code and sometimes I lose my internet connection and Putty throws an error that the connection was lost.
But it seems the Spark related processes are still running. When I reconnect to the server and run spark-shell
again, I get a lot of these errors:
17/02/07 11:15:50 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1486465722770_0002_01_000003 on host: ip-172-31-0-217.eu-west-1.compute.internal. Exit status: 1. Diagnostics: Exception from container-launch.
Googling this error suggested there are problems with the allocated memory, but as I am using small nodes on a test cluster, I don't even want to allocate more memory, I just want to release the resources used an restart the spark-shell
, but I don't see any "Spark" processes running.
How can I fix this easily? Is there some other process I should try closing/restarting, like hadoop, mapred, yarn etc? I wouldn't want to start a new cluster every time I experience this.