1

I have an Amazon EMR cluster running, to which I submit jobs using the spark-submit shell command.

The way I call it:

spark-submit --master yarn --driver-memory 10g convert.py

The convert.py script is running using PySpark with Python 3.4. After reading in a text file into an RDD, calling any method such as .take(5), .first(), .collect() or creating a dataframe from the RDD leads to the following error:

18/03/26 20:17:53 WARN TaskSetManager: Lost task 0.3 in stage 0.0 (TID 3, ip-xx-xx-xx-xxx.ec2.internal, executor 4): ExecutorLostFailure (executor 4 exited caused by one of the running tasks) Reason: Container marked as failed: container_0000000000001_0001_01_000001 on host: ip-xx-xx-xx-xxx.ec2.internal. Exit status: 52. Diagnostics: Exception from container-launch. Container id: container_0000000000001_0001_01_000001 Exit code: 52 Stack trace: ExitCodeException exitCode=52: at org.apache.hadoop.util.Shell.runCommand(Shell.java:972) at org.apache.hadoop.util.Shell.run(Shell.java:869) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1170) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:236) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:305) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:84) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

It only happens for one specific file (~900MB in size). I managed to replicate the issue by just using the pyspark shell as well. Interestingly enough, doing the same steps in scala using the spark-shell program works perfectly.

Could this be a problem with YARN? Also, memory shouldn't be an issue since I was able to convert an 18GB file with the same code.

Any guidance will be greatly appreciated.

Vlad
  • 23
  • 1
  • 6

0 Answers0