4

Why is BigTable structured as a two-level hierarchy of "family:qualifier"? Specifically, why is this enforced rather than just having columns and, say, recommending that users name their qualifiers "vertical:column"?

I am interested in whether or not enforcing this enables some engineering optimizations or if this is strictly a design thing.

user3038457
  • 185
  • 2
  • 11

1 Answers1

7

There are a couple of advantages to family groups:

  1. queries become easier by getting a group of column qualifiers in a single column family

  2. Bigtable has a notion of "locality groups". Locality groups allow a family to be written to a separate file, which helps in situations where some column families are accessed less frequently than others. You can see some information about locality groups in this analysis of HBase vs. Bigtable.

Solomon Duskis
  • 2,691
  • 16
  • 12
  • 2
    Hi Solomon Duskis, thanks for mentioning the concept of "locality groups". This might be an explanation why we didn't see an immediate performance increase in our benchmarks when we split our data over multiple column families (as reported here: https://stackoverflow.com/questions/46465762/bigtable-performance-influence-column-families). Could it be that our two column families were still stored in the same locality group? How is the grouping of column families in locality groups determined? Is this done dynamically based on the access pattern, or can we declare this explicitly? – bjorndv Oct 23 '17 at 07:43
  • Cloud Bigtable does not expose the notion of locality groups that the internal implementation of Bigtable has. All Cloud Bigtable column families are in a single locality group. – Solomon Duskis Oct 23 '17 at 11:50
  • 1
    How do queries become easier? Can you elaborate on this? It sounds again like a design/user-friendliness point rather than an engineering optimization. – user3038457 Oct 24 '17 at 00:32
  • Bigtable allows you to specify what ever you want for the column qualifier, as long as you have a column family. You can have predefined names for column qualifier, but still only request the family to get the entire set. You can also define arbitrary qualifiers (and possibly use those names as data) and get the set of columns by the family name. – Solomon Duskis Oct 24 '17 at 12:56