-1

Heres my schema enter image description here

Heres some example data enter image description here Rows of this row key structure $PipelineId--$PipelineRunTime will be written less often but with much larger data, not that it would be anywhere close to going over the row limit of data. And rows of this structure $ContentID--$ContentType--$PipelineName will be created much more often but with much less data

This is how I plan to query BT

  • READ all labels for $PipelineName and $PipelineRunTime
  • IS $ContentID in labels for $PipelineName at any PipelineRunTime?
  • READ $ContentID return all labels for any $PipelineName
Daniel Kobe
  • 9,376
  • 15
  • 62
  • 109

1 Answers1

0

The hot-spotting situation in the context of BigTable is related to key distribution and its request rate. There are two problems:

  1. How keys are distributed on the backend, and
  2. If the hot keys are distanced in the distribution.

For example, if you have 1 million keys and the request is for only two of them, frequently the capacity would be limited to 1 or 2 backends.

  1. In case the keys would be sequential, then possibly one backend would be serving both keys (will hotspot on high request rates).
  2. In case the keys would not be sequential, there is a probability that two backends would serve those.

As you try to use time as part of the key, you should look into:

To understand the performance characteristics, access pattern and to find out if there will be hot-spotting, you should run performance tests and use a key visualizer then apply optimizations if needed.

Donnald Cucharo
  • 3,866
  • 1
  • 10
  • 17
JM Gelilio
  • 3,482
  • 1
  • 11
  • 23