0

My row key's initial start part looks like "YYYYMMDDhhmmss" where 'ss' is always 00. Example: 20170603162100 , which corresponds to 16:21 on 06th June 2017 (Don't ask me why, but the time-stamp has to be at the start of the key!)

This is obviously every minute (and obviously every minute is unique) data.

This suffers from region hot-spotting. Row keys will be like this on a region server:

My read patterns: Get data for a unique minute (not for a hour, a day, a month, a year)

Say I have 10 region servers.

Here is a solution I am thinking of, which looks like kind of a salt(but is deterministic, and not random):

I see the mm Part - minute and assign a salt based on that. 00 minute: prefix A to row key 01 minute: prefix B to row key .. 09 minute: prefix J to row key 10 minute: prefix A to row key

This way all 'A' keys should distribute to first region server, and so forth. The advantages may be : all single minute requests to the same region server, which is bearable for me. And the very next minute, all requests to some other region server.

Also, when retrieving, i won't have to do parallel reads for I actually know the salt.

Can someone explain if I am somewhere wrong?

user4560
  • 51
  • 4

1 Answers1

1

Well, you have just 27 minutes covered with english alphabet, probably I would suggest to use two-letters salt, it still should distribute properly. (How many nodes do you have?).

Alternatively, you can try just to remove seconds from your row-key and reverse it.

vvg
  • 6,325
  • 19
  • 36
  • I have 10 region servers. That's why suggested to use A-J and then repeat A-J. Am I wrong somewhere? Also! The reverse idea is awesome! – user4560 Jun 03 '17 at 13:18
  • You'll definitely avoid hot spotting with such approach. However, I might be wrong but I'm not sure if there is 1 to 1 mapping between region server and first letter of rowkey (probably few letters not lexicographically close can end up at one region server). – vvg Jun 03 '17 at 14:37