I have a 4 node Cluster and im exploring Cloudera in order to do a TPCH Benchmark like Spark, Hive, Impala, among others. My cluster nodes are reasonable, with cpu with 4 cores, 8gb of RAM and 250GB of disk.
I'm trying to install correctly CDH 5, through the web UI, everything runs just fine, im able install the several tools that i, always maintaing the default roles/tools distribution that the installer recommends, the problem is, when the instalation ends i always end up with several health problems and warnings!
The major part i think it has to do with ram and most of the warnings are sugestions to inscrease memory on nodes components like heap sizes and other, witch leads to the appearence of the warning "memory threshold overcommited", i dont know if its better to ignore those sugestions or follow. Even though all the bad health warnings, i applied all the changes that are sugested and loaded my data to hive to start performing some queries, but on some cases i just get stucked when it start the map reduce jobs!
Can anyone give some possible solutions/suggestions? thanks in advance and sorry for the long post!