In order to avoid hot spotting of region servers in HBase it is advised to avoid sequential row keys. One of the approaches is to salt the first byte of the row key. I want to employ this technique in my client code.
Lets say I have n
number of region servers, each region server may hold up to m
number of regions. n*m
would be total number of regions then.
x
, value of the first byte will be 1 < x <= n*m
.
On the write path, when inserting data I'd be randomly generating value of x
and prepend it to my row key. That should help with even distribution of the keys.
Q1:Should I actually be smarter with regards to the salt generation strategy?
I need to perform a range scan (timeseries data). Since my data is scattered across several regions, I plan to place in parallel n*m
number of scan requests. Each will be executing in its own thread. After results are back, I'll perform aggregation in the client code.
Q2:Is there a way to group those requests so that instead of placing a scan per region I could do a request per region server?
I know that Apache Phoenix is doing something similar under the covers. But I think they are achieving this with coprocessors.