Say I start an cluster on Amazon elastic mapreduce and have one Master node instance, 2 core node instances and 15 task node instances.
I think I uploaded around 1 TB of data into hbase using mapreduce jobs and incremental uploads.
Now -
How do I find the table size and region splitting (bytes). Normally on CDH I would do a hadoop fs -du /hbase. But there is not /hbase directory on my master node.
I am also curious to know how the region server allocation will work. So even if I have 100 regions - if I have 1 master node - it means the whole IO will be throttled right ?
Thanks Regards