In a previous question I asked here:
How to avoid gc overhead limit exceeded in a range query with GeoSpark?
I was trying to run that query on my local cluster. The query never really completed. However I am trying to run the same on an AWS/EMR cluster. After a few days I am still stuck with it especially with the configuration.
I am using a cluster of 1 master + 4 nodes all with the same spec m4.xlarge (4 vcpu | 16GB) and this is the configuration I am passing:
UPDATE
This is what I am following as guide: https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/
Configuration for the cluster:
[
{
"Classification": "yarn-site",
"Properties": {
"yarn.nodemanager.vmem-check-enabled": "false",
"yarn.nodemanager.pmem-check-enabled": "false"
}
}
]
Configuration for spark application:
[
{
"Classification": "spark",
"Properties": {
"maximizeResourceAllocation": "false"
}
},
{
"Classification": "spark-defaults",
"Properties": {
"spark.network.timeout": "150000s",
"spark.driver.memory": "14.4g",
"spark.executor.memory": "14.4g",
"spark.executor.cores": "5",
"spark.driver.cores": "5",
"spark.executor.instances": "3",
"spark.default.parallelism": "30",
"spark.yarn.executor.memoryOverhead": "1.6g",
"spark.yarn.driver.memoryOverhead": "1.6g",
"spark.memory.fraction": "0.7",
"spark.memory.storageFraction": "0.40",
"spark.storage.level": "MEMORY_AND_DISK_SER",
"spark.rdd.compress": "true"
}
}
]
I have re-read the configuration setups and simplified my configuration and I get a new error now:
ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container from a bad node: container_1564588184524_0001_01_000003 on host: ip-172-31-16-70.eu-west-1.compute.internal. Exit status: 52. Diagnostics: Exception from container-launch.
Any idea?