Hadoop does not do block level balancing by default. There are some tools you can use to manually do balancing in Hadoop, namely https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/CommandsManual.html#balancer. Note that balancing HDFS is actually quite expensive if you have a small number of completely empty or new nodes that you have just added to an otherwise full cluster, and my experience with it, is that it only does an alright job of balancing the HDFS blocks. Running the balancer multiple times can improve the overall balance. There are also some alternative implementations that can do a better job of balancing than the one built-in to Hadoop.
You can inspect the balance of blocks from the HDFS NameNode UI if you click on the "Live Nodes" link. The "Block Pool Used" column is the useful column for this purpose. If you see a high variance in the percentage of blocks used on the various machines, then you may need to rebalance your HDFS cluster.
The balancer_switch
only affects regionserver balance. HBase will automatically balance your regions in the cluster by default, but you can manually run the balancer
at any time from the hbase shell
.
You can inspect the region balance from the main page of the HBase master UI under the "Region Servers section" in the column named "Load", there is a value named "numberOfOnlineRegions". In general, HBase does a pretty good job of keeping this balanced. I've only seen a few times when I've initially created tables that the default balancing algorithm comes up with a skewed set of regions. Regardless, the region balancer is actually fairly cheap and can be done quite quickly. Running it once is usually sufficient to get you in to a very balanced state.