I'm using an ec2 hadoop cluster that is comprised of 20 c3.8xlarge machines, each having 60 GB RAM and 32 virtual CPUs. In every machine I set up yarn and mapreduce settings as documented here https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hadoop-task-config.html, i.e. as showed below:
c3.8xlarge
Configuration Option Default Value
mapreduce.map.java.opts -Xmx1331m
mapreduce.reduce.java.opts -Xmx2662m
mapreduce.map.memory.mb 1664
mapreduce.reduce.memory.mb 3328
yarn.app.mapreduce.am.resource.mb 3328
yarn.scheduler.minimum-allocation-mb 32
yarn.scheduler.maximum-allocation-mb 53248
yarn.nodemanager.resource.memory-mb 53248
Now what criteria I have to use in order to determine the most appropriate number of workers to use with giraph? I.e. what number I have to use for -w argument? Is that criteria related to above settings?