Questions tagged [bigtable]

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Bigtable

A Distributed Storage System for Structured Data

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

Some features

  • fast and extremely large-scale DBMS
  • a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases.
  • designed to scale into the petabyte range
  • it works across hundreds or thousands of machines
  • it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration
  • each table has multiple dimensions (one of which is a field for time, allowing versioning)
  • tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.

Architecture

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time."

In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.

Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.

The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.

Implementation

BigTable is built on Google File System (GFS), which is used as a backing store for log and data files. GFS provides reliable storage for SSTables, a Google-proprietary file format used to persist table data.

Another service that BigTable makes heavy use of is Chubby, a highly-available, reliable distributed lock service. Chubby allows clients to take a lock, possibly associating it with some metadata, which it can renew by sending keep alive messages back to Chubby. The locks are stored in a filesystem-like hierarchical naming structure.

There are three primary server types of interest in the Bigtable system:

  1. Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
  2. Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
  3. Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

API

Typical operations to BigTable are creation and deletion of tables and column families, writing data and deleting columns from a row. BigTable provides this functions to application developers in an API. Transactions are supported at the row level, but not across several row keys.

References

Related Tags

528 questions
3
votes
1 answer

How do I delete filtered rows in BigTable GCP

I trying to delete filtered rows in BigTable. I have a table that has an empty value in a cell that I would like to remove the row from the table, I wrote a filter that select the relevant rows, but when I try to delete the rows, I get an error. …
3
votes
2 answers

Bigtable row key scenario to avoid hotspotting?

Bigtable row key scenario to avoid hotspotting? A company needs you to create a schema in Google Bigtable that will allow for the historical analysis of the last 2 years of records. Each record that comes in is sent every 15 minutes, and contains a…
Roshan Fernando
  • 493
  • 11
  • 31
3
votes
1 answer

Disable Cloud Bigtable cluster to avoid getting billed, but without deleting data

I have created a development Cloud Bigtable cluster and would like to disable this when I am not working on it to avoid getting billed, but the only option I see is to delete the cluster; doing this will require me to recreate the tables which I…
3
votes
2 answers

How do I assign a oneof field on a protobuf message if the child message has no fields?

I want to create a BigTable DeleteFromRow mutation. The proto for the Mutation and the DeleteFromRow look like this: oneof mutation { // Set a cell's value. SetCell set_cell = 1; // Deletes cells from a column. DeleteFromColumn…
bartaelterman
  • 795
  • 10
  • 26
3
votes
1 answer

ListProperty vs StringListProperty on Google App Engine

I want to store lists of integers (user ids), should I make them strings and use a StringListProperty, or just use a ListProperty, I'm wondering what is more optimized, the specific StringListProperty of the heterogeneous ListProperty (when used…
Alex Amato
  • 1,591
  • 4
  • 19
  • 32
3
votes
2 answers

OpenTSDB some data lost

I'm working with Google Cloud Platform GKE, and using kubernetes now. I am trying to use OpenTSDB through google Bigtable and it's QA time. but it has been unexpected bugs shown when I just put some data but, it is not shown. Even long time goes…
3
votes
4 answers

How to compose a row key in BigTable?

In https://cloud.google.com/bigtable/docs/schema-design it is clearly described how to choose the row key of a table. But I could not find any info on how to compose this row key. Where and by what means it is composed?
Mike
  • 2,065
  • 25
  • 29
3
votes
1 answer

Golang listenUDP multiple ports blocking with BigTable connection

I'm creating a simple udp client that listens on multiple ports and saves the request to bigtable. It's essential to listen on different ports before you ask. Everything was working nicely until I included bigtable. After doing so, the listeners…
Jenny Blunt
  • 1,576
  • 1
  • 18
  • 41
3
votes
1 answer

How data is stored physically in Bigtable

Lets assume a table test cf:a cf:b yy:a kk:cat "com.cnn.news" zubrava10 sobaka foobar "ch.main.users" - - - purrpurr And the first cell ("zubrava") has 10 versions (10…
pavelkolodin
  • 2,859
  • 3
  • 31
  • 74
3
votes
2 answers

Learning Google App Engine & BigTable

I have a traditional RDBMS based PHP app that I need to convert over to GAE and would like to properly learn how BigTable works prior to doing this. However, I'd kinda like to do it through sample problems or examples that show the maximal way to…
ylluminate
  • 12,102
  • 17
  • 78
  • 152
3
votes
4 answers

Is there any security concern with displaying the Key value to users in a URL?

I am using the Key value of entities in my datastore as the unique identifier in the URL for pulling up a record: http://mysite.appspot.com/myaction/1x7s3fgdlbnRlcklkcicLAbcXc2VyQWNjb3VudCIFYW9uZ This is not a very attractive solution, nor is it…
Egg Yolk
  • 271
  • 1
  • 3
  • 9
3
votes
4 answers

is it possible to share a datastore between multiple GAE applications

I like to work with data saved in one GAE application in other GAE applications. Basically share the datastore between multiple web applications in Google App Engine (Python) Development and Production. Also if possible…
Brian
  • 127
  • 2
  • 10
3
votes
4 answers

How do the newer database models achieve better scalability and performance as compared to a traditional RDBMS implementation?

We have BigTable from Google, Hadoop, actively contributed by Yahoo, Dynamo from Amazon all aiming towards one common goal - making data management as scalable as possible. By scalability what I understand is that the cost of the usage should not…
Moeb
  • 10,527
  • 31
  • 84
  • 110
3
votes
2 answers

Bigtable CSV import

I have a large csv dataset (>5TB) in multiple files (stored in a storage bucket) that I need to import into Google Bigtable. The files are in the format: rowkey,s1,s2,s3,s4 text,int,int,int,int ... There is an importtsv function with hbase that…
mattrix
  • 55
  • 1
  • 6
3
votes
1 answer

What is sparse and purpose of sparse table in Bigtable?

I have some information that I don't understand: Bigtable may be understood a sparse table. Most cells contain null values - too sparse to store it as in relational database systems. Bigtable rather implements a multi-dimensional sparse map. Is it…
Humaun Rashid Nayan
  • 1,232
  • 14
  • 25