How does columnfamily from BigTable in GCP relate to columns in a relational database

Question

I am trying to migrate a table that is currently in a relational database to BigTable. Let's assume that the table currently has the following structure:

Table: Messages

Columns:

Message_id
Message_text
Message_timestamp

How can I create a similar table in BigTable? From what I can see in the documentation, BigTable uses ColumnFamily. Is ColumnFamily the equivalent of a column in a relational database?

https://cloud.google.com/bigtable/docs/schema-design#column-families can help understand column families. It's also important to design your row key depending on your needs. If you need to query your table using message_id, use message_id as your row key. You can also use a composite key (message_id and timestamp) if you need. — norbjd, Mar 20 '21 at 15:04

jccampanero · Answer 1 · 2021-03-22T13:15:53.947

BigTable is different from a relational database system in many ways.

Regarding database structures, BigTable should be considered a wide-column, NoSQL database.

Basically, every record is represented by a row and for this row you have the ability to provide an arbitrary number of name-value pairs.

This row has the following characteristics.

Row keys

Every row is identified univocally by a row key. It is similar to a primary key in a relational database. This field is stored in lexicographic order by the system, and is the only information that will be indexed in a table.
In the construction of this key you can choose a single field or combine several ones, separated by # or any other delimiter.
The construction of this key is the most important aspect to take into account when constructing your tables. You must thing about how will you query the information. Among others, keep in mind several things (always remember the lexicographic order):
- Define prefixes by concatenating fields that allows you to fetch information efficiently. BigTable allows and you to scan information that starts with a certain prefix.
- Related, model your key in a way that allows you to store common information (think, for example, in all the messages that come from a certain origin) together, so it can be fetched in a more efficient way.
- At the same time, define keys in a way that maximize dispersion and load balance between the different nodes in your BigTable cluster.

Column families

The information associated with a row is organized in column families. It has no correspondence with any concept in a relational database.
A column family allows you to agglutinate several related fields, columns.
You need to define the column families before-hand.

Columns

A column will store the actual values. It is similar in a certain sense to a column in a relational database.
You can have different columns for different rows. BigTable will sparsely store the information, if you do not provide a value for a row, it will consume no space.
BigTable is a third dimensional database: for every record, in addition to the actual value, a timestamp is stored as well.

In your use case, you can model your table like this (consider, for example, that you are able to identify the origin of the message as well, and that it is a value information):

Row key = message_origin#message_timestamp (truncated to half hour, hour...)¹#message_id

Column family = message_details

Columns = message_text, message_timestamp

This will generate row keys like, consider for example that the message was sent from a device with id MT43:

MT43#1330516800#1242635

Please, as @norbjd suggested, see the relevant documentation for an in-deep explanation of these concepts.

One important difference with a relational database to note: BigTable only offers atomic single-row transactions and if using single cluster routing.

¹ See, for instance: How to round unix timestamp up and down to nearest half hour?

How does columnfamily from BigTable in GCP relate to columns in a relational database

1 Answers1