So I am having a cloudera cluster with 7 worker nodes.
- 30GB RAM
- 4 vCPUs
Here are some of my configurations which I found important (from Google) in tuning performance of my cluster. I am running with:
yarn.nodemanager.resource.cpu-vcores
=> 4yarn.nodemanager.resource.memory-mb
=> 17GB (Rest reserved for OS and other processes)mapreduce.map.memory.mb
=> 2GBmapreduce.reduce.memory.mb
=> 2GB- Running
nproc
=> 4 (Number of processing units available)
Now my concern is, when I look at my ResourceManager
, I see Available Memory as 119 GB
which is fine. But when I run a heavy sqoop
job and my cluster is at its peak it uses only ~59 GB
of memory, leaving ~60 GB
memory unused.
One way which I see, can fix this unused memory issue is increasing map|reduce.memory
to 4 GB so that we can use upto 16 GB per node.
Other way is to increase the number of containers, which I am not sure how.
- 4 cores x 7 nodes = 28 possible containers. 3 being used by other processes, only 5 are currently being available for sqoop job.
What should be the right config to improve cluster performance in this case. Can I increase the number of containers, say 2 containers per core. And is it recommended?
Any help or suggestions on the cluster configuration would be highly appreciated. Thanks.