reverse domain name row key, automatic splitting, and load balancing

Question

I'm designing an HBase schema with a row key that starts with the domain name reversed. E.g., com.example.www. Although there are many more domains that end in .com than say .org or .edu, I assume that I don't have to manage splitting myself, and I can rely on HBase's automatic splitting to distribute the rows across regions. I.e., regions will split as they get too large.

I should end up with more regions that have keys that start with com. than say org., but I assume that's okay, and the "com. regions" should end up distributed across my region servers, correct?

Is there an issue with load balancing here? In the 2011 HBase Schema Design video by Lars (the link goes directly to the section of interest), he discusses a schema design that also has the reverse domain at the beginning of the key. The video says that an MD5 hash of the reverse domain was used "for load balancing reasons".

I'm probably missing something... If some.website.com is just as likely to appear in my input as another.website.org, doesn't that mean each row is just as likely to hit one region (and even one region server) vs another?

Rubén Moraleda · Accepted Answer · 2015-01-29T11:35:08.567

HBase will normally split a region in 2 at it's mid point when it reaches hbase.hregion.max.filesize (depending on the split policy). You can rely on automatic splitting and but you'll end with odd and lexically uneven split points because of the nature of your rowkeys (lots of "com" domains against few "org" domains).

It may be not your exact case but think of this potential issue:

Starting with an empty table with just 1 region you insert 145M domains sequentially, starting from com.. and ending in org..
At 80 million mark (a fictitious com.nnnn.www), the region automatically splits into 2 at "com.f*", resulting in 2 40 million regions, and continues writing rows into region 2
At 120 million mark (a fictitious com.yyyy.www), the second region reaches the max filesize and splits into 2 40 million regions at "com.p*" and continues writing rows into region 3.
The job ends with the 150M domains, no more splits are performed.

Given this case, Regions 1 & 2 will store 40M rows each one but Region 3 will store 65M rows (it would be splitted at 80M, but it maybe never reach that amount). Also, since you'll write always to the last region (even with batching enabled), the job would be a lot slower than issuing batches of writes to multiple regions at the same time.

Another problem, imagine you realize you also need to add .us domains (10M). Given this design they will go to the Region 3, increasing the amount of rows hosted to 75M.

The common approach to ensure even distribution of keys among regions is to prepend to the rowkey a few chars of the md5 of the key (in this case the domain name). In HBase, the very first bytes of the row keys determine the region that will host it.

Just by prepending a few chars of the md5 would be enough to prevent as much as hotspotting as possible (one region getting too much writes) and to get good automatic splits, but it's generally recommended to pre-split tables to ensure even better splitting.

If you prepend 2 chars of the md5 to your rowkeys you can pre-split the table with 15 split points: "10", "20", "30" ... until "e0". That will create 16 regions and in case any of them needs to be automatically splitted it will be done at their mid point. i.e: When the region starting at "a0" and ending in "af" reaches hbase.hregion.max.filesize it will be splitted about "a8" and each one of the regions will store half of the "a" bucket.

This is an example of which regions would host each row if you have 16 pre-split regions with 2 char prefixed row keys:

- Region 1 ---------
0b|com.example4.www
- Region 2 ---------
1b|org.example.www
10|com.example.www
- Region 5 ---------
56|com.example3.www
- Region 10 ---------
96|org.example5.www
- Region 11 ---------
af|com.example5.www
- Region 14 ---------
d5|org.example3.www
db|com.example2.www
de|org.example2.www
- Region 16 ---------
fb|org.example4.www

Given a lot more domains it would end being much more even and almost all regions would store the same amount of domains.

In most of cases having 8-16 pre-split regions will be more than enough, but if not, you can go for 32 or even 64 pre-split regions, until a max of 256 (that would be having "01", "02", "03" ... "9f", "a0", "a1" ... until "fe")

Thanks Rubén. Can you elaborate more about "and end with odd and uneven split points" -- I still don't see why max.filesize isn't sufficient -- i.e., why splitting by size only isn't sufficient. If I don't hash or salt, suppose I end up with essentially all "edu" domains in region 1, essentially all "com" domains in regions 2-9, and essentially all "org" domains in region 10 (assume I end up with 10 regions). Why is it that not as good as spreading all domains across all 10 regions with md5/salting? That's the part I don't understand. Any insight about that would be greatly appreciated. — Mark Rajcok, Jan 28 '15 at 23:12
@MarkRajcok I've improved my answer with an example, I hope is more clear now. It's just a matter on evenly distributing rows across all the servers as much as possible to get better performance and a schema which is better suited for exponential growth. Imagine you perform updates to rows when a domain receives a visitor, you'll get a lot more writes for .com domains since they're a lot more visited, therefore, their regions will receive a lot more requests. — Rubén Moraleda, Jan 29 '15 at 09:38

reverse domain name row key, automatic splitting, and load balancing

1 Answers1