0

I have a 4 node Cluster and im exploring Cloudera in order to do a TPCH Benchmark like Spark, Hive, Impala, among others. My cluster nodes are reasonable, with cpu with 4 cores, 8gb of RAM and 250GB of disk.

I'm trying to install correctly CDH 5, through the web UI, everything runs just fine, im able install the several tools that i, always maintaing the default roles/tools distribution that the installer recommends, the problem is, when the instalation ends i always end up with several health problems and warnings!

The major part i think it has to do with ram and most of the warnings are sugestions to inscrease memory on nodes components like heap sizes and other, witch leads to the appearence of the warning "memory threshold overcommited", i dont know if its better to ignore those sugestions or follow. Even though all the bad health warnings, i applied all the changes that are sugested and loaded my data to hive to start performing some queries, but on some cases i just get stucked when it start the map reduce jobs!

Can anyone give some possible solutions/suggestions? thanks in advance and sorry for the long post!

Community
  • 1
  • 1
  • 8 *4 = 32 GB of memory is really small in the grand scheme of things... Especially for Spark – OneCricketeer Mar 12 '17 at 04:05
  • You should add a LARGE node for the non-core-services -- Cloudera Manager, its monitoring services, Hue, Oozie *(required by Hue for some silly reason)*, etc. etc. etc. -- and also the Spark gateway, Spark history service, YARN JobHistory, etc. etc. -- and also Impala Catalog, etc. etc. etc. – Samson Scharfrichter Mar 12 '17 at 10:55
  • Note that 8 GB of RAM alone can be necessary for Hive Metastore service in case of heavy load. Same for HiveServer2. Same for each Impala daemon if you really want to do stress tests (and that's still for "small data"). – Samson Scharfrichter Mar 12 '17 at 10:57
  • Yes i'm aware that i haven't the ideal specs, but whats strange is that im not capable of doing a simple select count(*) from TABLE, the mapreduce job just stays stuck at 100% map and 0% on the reduce – Mário Rodrigues Mar 12 '17 at 21:15

1 Answers1

0

You can usually ignore memory overcommitted errors because most Java applications use a fraction of their actual heap size. However as cricket_007 and Samson Scharfrichter have noted your setup is very small.

http://blog.cloudera.com/blog/2013/08/how-to-select-the-right-hardware-for-your-new-hadoop-cluster/ recommends:

Here are the recommended specifications for DataNode/TaskTrackers in a balanced Hadoop cluster:

12-24 1-4TB hard disks in a JBOD (Just a Bunch Of Disks) configuration 2 quad-/hex-/octo-core CPUs, running at least 2-2.5GHz 64-512GB of RAM Bonded Gigabit Ethernet or 10Gigabit Ethernet (the more storage density, the higher the network throughput needed)

The most likely reason your job is getting stuck is a lack of vcores. Look at the YARN web UI and see how many vcores you have available. If you have a low number (under 5) your job will lack the necessary slots to run any workload. For your cluster you can allow 3 vcores per node to give you at least 12 vcores. Vcores are not CPUs and you should think of vcores as a slot for a mapper/reducer task or application master. It will need at least 512MB of memory per vcore(you have to account for the JVM).

See https://blog.cloudera.com/blog/2015/10/untangling-apache-hadoop-yarn-part-2/ for a fuller understanding of vcores and basic settings.

Other obvious things to do are turn off services you don't need, and shrink heap sizes for the ones you do need to free up memory for actual workloads.

tk421
  • 5,775
  • 6
  • 23
  • 34