How data is stored physically in Bigtable

Question

Lets assume a table test

                  cf:a          cf:b      yy:a      kk:cat
"com.cnn.news"    zubrava10     sobaka    foobar
"ch.main.users"   -             -         -         purrpurr

And the first cell ("zubrava") has 10 versions (10 timestamps) ("zubrava1", "zubrava2"...)

How data of this table will be stored on disk?

I mean is the primary index always

("row","column_family:column",timestamp) ?

So 10 versions of the same row for 10 timestamps will be stored together? How the entire table is stored?

Is scan for all values of given column is as fast as in column-oriented models?

SELECT cf:a from test

score 2 · Answer 1 · edited Jun 20 '20 at 09:12

So 10 versions of the same row for 10 timestamps will be stored together? How the entire table is stored?

Bigtable is a row-oriented database, so all data for a single row are stored together, organized by column family, and then by column. Data is stored in reversed-timestamp order, which means it's easy and fast to ask for the latest value, but hard to ask for the oldest value.

Is scan for all values of given column is as fast as in column-oriented models?
SELECT cf:a from test

No, a column-oriented storage model stores all the data for a single column together, across all rows. Thus, a full-table scan in a column-oriented system (such as Google BigQuery) is faster than in a row-oriented storage system, but a row-oriented system provides for row-based mutations and row-based atomic mutations that a column-oriented storage system typically cannot.

On top of this, Bigtable provides a sorted order of all row keys in lexicographic order; column-oriented storage systems typically make no such guarantees.

How data is stored physically in Bigtable

1 Answers1