1

Salting, Hashing, and reversing of key, are recognise as techniques to avoid region Hot-spotting in HBase. Nevertheless, when I try to ingest 8000 data records by applying salting, and reverse key approaches (in two separate scenarios), I still see that all my data get ingested into one region. I'd like to know, if pre-splitting of HBase table at the table creation needs to be done, in-order to benefit from salting, and reversing of key in HBase, when try to avoid region hotspotting. Is there a technique to ingest data into multiple regions without performing manual region splitting in HBase tables.

CoolCK
  • 416
  • 5
  • 16

1 Answers1

1

Salting, Hashing, etc. is just a way of designing your row key structure, in order to take advantage of the table already having being split in accordance with your row-key design choice. By default, your table has only one region at creation, unless you do pre-splitting. This region will cover the entire range of your row key values, so it doesn't matter how you've designed the row key. All records will go to one region, and whichever Region Server happens to have been assigned to serve that region, it will be the only one servicing the requests.

If you ingest a lot of data into the table and your default region gets above the region size limit, HBase will split that region in half automatically, so two Region Servers will be accepting the requests. However, in your case you only ingested 8000 records. This is way too small to trigger a region split. You really don't want to leave it to HBase to handle region splits, because by definition it implies that hotspotting is taking place.

So pre-split your table at creation and make sure that how you pre-split makes sense in the context of how you have designed your row key.

VS_FF
  • 2,353
  • 3
  • 16
  • 34
  • Thanks for the reply. I observed the same scenario when I try to ingest 800000 data records. They still got ingested into one region server, even though I used salting. I suppose I have to pre-split my table at the time of table creation. – CoolCK Nov 02 '21 at 19:47
  • It doesn't matter how many records you ingest, what matters is the region size (in bytes). I think if your region size goes above the region size limit specified in HBase config, then HBase will split that region automatically. And YES, you do have to pre-split the table in order to take advantage of the Salting/etc. right away. – VS_FF Nov 02 '21 at 21:12
  • By the way, here's a simple thing you should do to understand this better: In HBase UI, click on your table and then click on your region (it should be the only region). Somewhere there you will see the range of row-key values that the region is responsible for. That should give you an idea that regardless of how you salting/etc., all of these values fall within the range that this region covers. And this region will NOT be split until it reaches the size specified in the max region size setting. – VS_FF Nov 03 '21 at 09:36