2

I have a HBase Table(Written through Apache Phoenix) , That needs to be read and write to a Flat Text File. Current Bottleneck is as we have 32 salt buckets for that HBase(Phoenix) table it opens only 32 mappers to read. And when the data grows over 100 Billion it becomes time consuming. Can someone point me how to control the number of mappers per region server for reading a HBase table? I also have seen program that explains in below URL , "https://gist.github.com/bbeaudreault/9788499" but I does not have a driver program that explains fully. Can someone help?

1 Answers1

0

In my observation, number of regions of table = number of mappers opened by framework .

so reduce number of regions which will in turn reduce number of mappers.

How can this be done :

1) pre-split hbase table while creating for ex 0-9 .

2) load all the data with in these regions by generating row prefix between 0-9.*

Below are various ways to do Splitting :

enter image description here

Also, have a look at apache-hbase-region-splitting-and-merging

Moreover, setting number of mappers does not guarantee that it will open those many, it was driven by input splits

You can change number of mappers using setNumMapTasks or conf.set('mapred.map.tasks','numberofmappersyouwanttoset') (but its a suggestion to configuration ).

About link provided by you, I don't know what is this how it works you can check with author.

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121