Hbase read optimization

Question

I am using 5 regionserves in my hbase cluseter. I am just storing md5 hash of the url as the rowkey and only one column family containing data field which contains data corresponding to the key(each row contains data of size around 30 KB). My requests are read intensive(very few writes and very large reads). I have done the bench marking for my cluster for around 300000 entries using pre split of 5(to store data uniformly across 5 region servers) and I am getting qps of around 200. In the bench marking I have run 150 threads to query read from a separate client box.

This qps is too less for me.What optimization can be done to improve the read qps(It will be okay for me if write qps will decrease as the result of the optimization). As of now I am using the default configuration for hbase. Each regionserver including master has 8 GB RAM and has 4 cores. And my cluster is in AWS in same zone.

Please provide more information about the benchmarks you've performed. — Rubén Moraleda, Jan 30 '15 at 11:57
The performance is quite poor, it should be a lot more. Is the family compressed?, is the client box in the same network?, please notice that 200qps on 30KB rows are 6MB/s (nearly 50mbit/s). I'd try to run the benchmark in one of the region servers, or even running multiple clients at the same time (with 1/3rd of the threads each) to see what happens. I'd also try to reduce & increment the amount of threads to see the results. I don't think of much you can do but to try to reduce the amount of data read & transferred by reading only the minimum set of columns you need in each case. — Rubén Moraleda, Jan 30 '15 at 19:07
@RubénMoraleda Yeah I will try to experiment with number of threads and using multiple clients.Also my cluster is in AWS in same network(zone) But I don't get the part "_to reduce the amount of data read & transferred by reading only the minimum set of columns you need in each case_", as I my table has only one column data, how can I reduce the amount of data read and transferred ? Also can you suggest me any changes that I can do in configuration(like blockchache connfiguration etc) to improve the result ? **PS : made changes in question about the system configuration of hbase cluster** — Harsh Sharma, Feb 01 '15 at 18:37
I meant that if you can split that row into multiple columns and only retrieve the ones you need instead of the full row, but if you need everything there's no point in it. Regarding config, you can [find here](http://hortonworks.com/blog/hbase-blockcache-101/) some information about the Block Cache but I don't really think there's much you can do server-side other than a little fine-tunning. Check with what happens + concurrent clients with less threads, with less/more RS to see what the impact is, and finally, try incrementing the RAM and JAVA HEAP of all Regionservers (8GB is too low). — Rubén Moraleda, Feb 02 '15 at 06:02

Hbase read optimization

0 Answers0