2

In a previous question I asked here:

How to avoid gc overhead limit exceeded in a range query with GeoSpark?

I was trying to run that query on my local cluster. The query never really completed. However I am trying to run the same on an AWS/EMR cluster. After a few days I am still stuck with it especially with the configuration.

I am using a cluster of 1 master + 4 nodes all with the same spec m4.xlarge (4 vcpu | 16GB) and this is the configuration I am passing:

UPDATE

This is what I am following as guide: https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/

Configuration for the cluster:

[
     {
       "Classification": "yarn-site",
       "Properties": {
         "yarn.nodemanager.vmem-check-enabled": "false",
         "yarn.nodemanager.pmem-check-enabled": "false"
       }
     }
]

Configuration for spark application:

[
     {
       "Classification": "spark",
       "Properties": {
         "maximizeResourceAllocation": "false"
       }
     },
     {
       "Classification": "spark-defaults",
       "Properties": {
         "spark.network.timeout": "150000s",
         "spark.driver.memory": "14.4g",
         "spark.executor.memory": "14.4g",
         "spark.executor.cores": "5",
         "spark.driver.cores": "5",
         "spark.executor.instances": "3",
         "spark.default.parallelism": "30",
         "spark.yarn.executor.memoryOverhead": "1.6g",
         "spark.yarn.driver.memoryOverhead": "1.6g",
         "spark.memory.fraction": "0.7",
         "spark.memory.storageFraction": "0.40",
         "spark.storage.level": "MEMORY_AND_DISK_SER",
         "spark.rdd.compress": "true"
       }
     }
 ]

I have re-read the configuration setups and simplified my configuration and I get a new error now:

ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container from a bad node: container_1564588184524_0001_01_000003 on host: ip-172-31-16-70.eu-west-1.compute.internal. Exit status: 52. Diagnostics: Exception from container-launch.

Any idea?

Randomize
  • 8,651
  • 18
  • 78
  • 133
  • yarn of EMR uses around 75% of your instances ram but you give 16G total to the spark which cannot be done. Check [this](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html). – Lamanus Jul 30 '19 at 13:43
  • true, that is a sort of desperation case :) I was using this formulas and suggestions I found around which totally are confusing me – Randomize Jul 30 '19 at 13:46
  • However I changed it to 2g only, but still have the same problem. – Randomize Jul 30 '19 at 13:47
  • 2g for what? you set 3 executors with 9g and overhead 7g = 48g. Usually, the overhead is less than 10 % of the defined memory. Even you set both for 2g, totally 12G would need and it is the allowed maximum for yarn. However, you also set extra option for java memories and so on. – Lamanus Jul 30 '19 at 13:51
  • 2g for spark.yarn.executor.memoryOverhead and spark.yarn.driver.memoryOverhead and gave 14g to spark.driver.memory and spark.executor.memory – Randomize Jul 30 '19 at 13:56
  • There are so many wrong options. You should check the meaning of each option first, I think. [See This](https://spark.apache.org/docs/latest/configuration.html). – Lamanus Jul 30 '19 at 13:59
  • It is what I am trying to figure out. Most of that options come from this article: https://aws.amazon.com/blogs/big-data/best-practices-for-successfully-managing-memory-for-apache-spark-applications-on-amazon-emr/ which wasn't working for me tho, and I started to add/change other options. – Randomize Jul 30 '19 at 15:47

0 Answers0