I have a HBase Table(Written through Apache Phoenix) , That needs to be read and write to a Flat Text File. Current Bottleneck is as we have 32 salt buckets for that HBase(Phoenix) table it opens only 32 mappers to read. And when the data grows over 100 Billion it becomes time consuming. Can someone point me how to control the number of mappers per region server for reading a HBase table? I also have seen program that explains in below URL , "https://gist.github.com/bbeaudreault/9788499" but I does not have a driver program that explains fully. Can someone help?
Asked
Active
Viewed 1,087 times
2
-
Vijay : to reduce number of mappers you can reduce number of regions in below way.. was it help full ? – Ram Ghadiyaram Sep 30 '16 at 06:04
-
With my experience it should work, pls let me know whether solution is working or not. – Ram Ghadiyaram Oct 02 '16 at 07:55
1 Answers
0
In my observation, number of regions of table = number of mappers opened by framework .
so reduce number of regions which will in turn reduce number of mappers.
How can this be done :
1) pre-split hbase table while creating for ex 0-9 .
2) load all the data with in these regions by generating row prefix between 0-9.*
Below are various ways to do Splitting :
Also, have a look at apache-hbase-region-splitting-and-merging
Moreover, setting number of mappers does not guarantee that it will open those many, it was driven by input splits
You can change number of mappers using setNumMapTasks
or conf.set('mapred.map.tasks','numberofmappersyouwanttoset')
(but its a suggestion to configuration ).
About link provided by you, I don't know what is this how it works you can check with author.

Ram Ghadiyaram
- 28,239
- 13
- 95
- 121