2

I read about salting and how it is used for load balancing in case of sequential keys. Basically, salt should distribute sequential rows to different region servers.

I also read this article which explains how to run MR jobs on tables which were salted.

So, it advised to generate salt as:

StringUtils.leftPad(Integer.toString(Math.abs(keyCore.hashCode() % numberOfRegions)), 3, "0") + "|" + logicalKey

So you basically take hash of original key and do modulo division to get the salt.

You also need to specify pre-splitting based on the salt, so that each region would contain rows with same salt.

All of this seems reasonable. My question is, what happens when you add more region servers?

It is expected that you also increase number of regions so you would have to change split strategy so that new regions follow the "one-salt-for-all-rows-in-region" rule. You would also need to perform modulo division by an increased numberOfRegions.

All of that means that I could mess up queries when trying to get rows which were added when number of regions is smaller. For example, at the beginning you could be dividing by modulo 10 (10 regions), and then you would be dividing modulo 50 (now, 50 regions).

Can anyone please explain the full procedure to this salting/pre-splitting properly?

Kobe-Wan Kenobi
  • 3,694
  • 2
  • 40
  • 67

1 Answers1

1

Salt is used to avoid a hot spot for a single region. In your case, numberOfRegions is treated as numbers of regions involved into bulk write with sequential keys. This number is not necessary to coincide with total numbers of regions in your cluster. If, for example, 10 regions can handle your amount of writes, you should use numberOfRegions equals to 10 in your formula, or 20 in case if in the future you suggest doubling the number of writes. And you don't need to follow the rule one salt for all rows in the region. You need to find numbers of regions sufficient to handle your writes amount.

Furthermore, now you don't need a write custom input table format as in blog post that you are mentioned. You can specify a several scan for single map reduce job. In this case, data locality will be handled automatically. Each scan will produce a several input splits, one for each region with data in this scan. See example bellow

 List<Scan> scans = new ArrayList<>();
 for(int i = 0; i < numberOfRegions; i++){
        Scan scan = new Scan();
        scan.setBatch(500);
        scan.setAttribute(Scan.SCAN_ATTRIBUTES_TABLE_NAME, YOUR_TABLE_NAME);
        String regionSalt = StringUtils.leftPad(Integer.toString(i), 3, "0");
        scan.setStartRow( Bytes.toBytes(regionSalt + "|" + scanStart));
        scan.setStartRow( Bytes.toBytes(regionSalt + "|" + scanStop);
        scans.add(scan);
 }

 TableMapReduceUtil.initTableMapperJob(
            scans,
            YourMapper.class,
            Text.class,
            Text.class,
            job);
Alexander Kuznetsov
  • 3,062
  • 2
  • 25
  • 29
  • Thank you for your answer, I understand all but one part - "And you don't need to follow the rule one salt for all rows in the region". If you look at the article to which I referenced you will see the reason why rows in a region should have the same salt. The reason for this is to be able to SCAN by specifying STARTROW and ENDROW, while using records as input to MR jobs. Please correct me if I'm wrong, but I think that it is necessary? One more question, how to know which number of regions will be sufficient to handle writes? – Kobe-Wan Kenobi Dec 18 '15 at 13:47
  • If you will use SCAN by specifying STARTROW and ENDROW it will work not depending on that is STARTROW and ENDROW belong to the same region or not. As about how many regions you will need to servers your writes, it depends on many factors, like you record size, cluster configuration and network, hardware. Typical values that region can handle from several thousand to several ten thousands writes request per second. – Alexander Kuznetsov Dec 18 '15 at 15:39
  • 1
    But how would you specify STARTROW and ENDROW if keys are: 1-timestamp1, 2-timestamp2, 3-timestamp3, 1-timestamp4 and you want to get all rows between ts1 and ts4? You would have to include salt value also in START, END, and you can specify only one value for that. Idea with one salt per region is to enable data locality - each TaskTracker of Hadoop will process only the data on its node, so it needs to modify START, END so that contains appropriate solt at the beginning. http://blog.cloudera.com/blog/2015/06/how-to-scan-salted-apache-hbase-tables-with-region-specific-key-ranges-in-mapreduce/ – Kobe-Wan Kenobi Dec 18 '15 at 15:56
  • I added an example of setup mapreduce job in this case. – Alexander Kuznetsov Dec 18 '15 at 20:43
  • Wow, thanks for the added example, I didn't know that something like this is possible. But is this a new option? Please forgive me for being skeptic, but the post on Cloudera is relatively new (June this year), so I am wondering why would they explain that approach if this was possible? Can I be sure that I'm safe and that data locality will be achieved? Hadoop nodes won't try to process data which is not on their own node? – Kobe-Wan Kenobi Dec 20 '15 at 23:13
  • This option comes into HBase from versions 0.94.5. Post in Cloudera is not written by Cloudera engineers, it is replenished article. Data locality achieved in the same way as for single scan API. – Alexander Kuznetsov Dec 21 '15 at 04:34
  • Thanks, I have accepted your answer and you have an up-vote for all the useful information. One more thing - will this also work in case I set FilterList for the scans? And can I find any of this information anywhere? I tried to google it, but no luck. This are the last questions for me, I promise, you've been more than helpful. – Kobe-Wan Kenobi Dec 21 '15 at 09:06
  • 1
    Yes, Filter List will work in this case. In general, that works for a single scan will work for multiple scans too. – Alexander Kuznetsov Dec 21 '15 at 09:35