Will my BigTable schema result in hotspotting?

Question

Heres my schema

Heres some example data Rows of this row key structure $PipelineId--$PipelineRunTime will be written less often but with much larger data, not that it would be anywhere close to going over the row limit of data. And rows of this structure $ContentID--$ContentType--$PipelineName will be created much more often but with much less data

This is how I plan to query BT

READ all labels for $PipelineName and $PipelineRunTime
IS $ContentID in labels for $PipelineName at any PipelineRunTime?
READ $ContentID return all labels for any $PipelineName

are your content IDs sequentials? – guillaume blaquiere Oct 08 '21 at 07:30 — guillaume blaquiere, Oct 08 '21 at 07:30
@guillaumeblaquiere No they are UUIDs – Daniel Kobe Oct 08 '21 at 17:49 — Daniel Kobe, Oct 08 '21 at 17:49

score 0 · Answer 1 · edited Oct 11 '21 at 06:17

The hot-spotting situation in the context of BigTable is related to key distribution and its request rate. There are two problems:

How keys are distributed on the backend, and
If the hot keys are distanced in the distribution.

For example, if you have 1 million keys and the request is for only two of them, frequently the capacity would be limited to 1 or 2 backends.

In case the keys would be sequential, then possibly one backend would be serving both keys (will hotspot on high request rates).
In case the keys would not be sequential, there is a probability that two backends would serve those.

As you try to use time as part of the key, you should look into:

To understand the performance characteristics, access pattern and to find out if there will be hot-spotting, you should run performance tests and use a key visualizer then apply optimizations if needed.

Will my BigTable schema result in hotspotting?

1 Answers1