Questions tagged [bigtable]

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Bigtable

A Distributed Storage System for Structured Data

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

Some features

fast and extremely large-scale DBMS
a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases.
designed to scale into the petabyte range
it works across hundreds or thousands of machines
it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration
each table has multiple dimensions (one of which is a field for time, allowing versioning)
tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.

Architecture

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time."

In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.

Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.

The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.

Implementation

BigTable is built on Google File System (GFS), which is used as a backing store for log and data files. GFS provides reliable storage for SSTables, a Google-proprietary file format used to persist table data.

Another service that BigTable makes heavy use of is Chubby, a highly-available, reliable distributed lock service. Chubby allows clients to take a lock, possibly associating it with some metadata, which it can renew by sending keep alive messages back to Chubby. The locks are stored in a filesystem-like hierarchical naming structure.

There are three primary server types of interest in the Bigtable system:

Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

API

Typical operations to BigTable are creation and deletion of tables and column families, writing data and deleting columns from a row. BigTable provides this functions to application developers in an API. Transactions are supported at the row level, but not across several row keys.

References

Whitepaper Bigtable: A Distributed Storage System for Structured Data
Whitepaper The Google File System
Whitepaper The Chubby lock service for loosely-coupled distributed systems

Related Tags

google-bigquery commercial version of BigTable
hbase open source implementation of BigTable

528 questions

votes

1 answer

How to connect to Bigtable Emulator from a GoLang application? How to use it?

I am trying to use BigTable Emulator. I have never used it before. I followed the documentation but not able to understand, How to connect an application to Emulator. How to set BIGTABLE_EMULATOR_HOST environment variable. Please help by…

go bigtable google-cloud-bigtable

asked Jun 17 '18 at 15:06

yogesh_desai

votes

2 answers

Do a mass db.delete on App Engine, without eating CPU

We've got a reasonably-sized database on Google App Engine - just over 50,000 entities - that we want to clear out stale data from. The plan was to write a deferred task to iterate over the entities we no longer wanted, and delete them in…

python google-app-engine cpu-usage bigtable

asked Dec 15 '10 at 08:42

Blair Holloway

15,969
2
29
28

votes

3 answers

Database design - google app engine

I am working with google app engine and using the low leval java api to access Big Table. I'm building a SAAS application with 4 layers: Client web browser RESTful resources layer Business layer Data access layer I'm building an application to…

google-app-engine database-design bigtable appointment

asked Jun 25 '10 at 17:47

Chris Dutrow

48,402
65
188
258

votes

3 answers

What aspect of relational databases makes it difficult for them to scale sufficiently on services like Google App Engine?

Apparently the reason for the BigTable architecture has to do with the difficulty scaling relational databases when you're dealing with the massive number of servers that Google has to deal with. But technically speaking what exactly makes it…

database google-app-engine scalability relational-database bigtable

asked Jan 30 '10 at 05:20

pacman

votes

2 answers

Does google cloud BigTable have a data browser?

I need to view the data in a BigTable table, but I can't find a data browser in the web console. (Dynamo has a nice browser in the AWS web console.) Is there a data browser for BigTable, or am I limited to the cbt command line?

google-cloud-bigtable bigtable

asked Oct 21 '21 at 00:50

Dean Schulze

9,633
24
100
165

votes

2 answers

Trying to simulate cell level TTL in bigtable but whole column family data is getting removed by garbage collection

created a table with the following rules: so with this, data should expire after 1 second (as per docs) async function createTable() { console.log("Creating Table"); const options = { families: [ { name:…

node.js google-cloud-platform google-cloud-functions google-cloud-bigtable bigtable

asked Dec 17 '19 at 05:17

Sumeet.Jain

1,533
9
26

votes

1 answer

Big table vs Big Query usecase for timeseries data

I am looking to finalize on Big table vs Big Query for my usecase of timeseries data. I had gone through https://cloud.google.com/bigtable/docs/schema-design-time-series This is for storing an Omniture data which contains information like website…

google-bigquery bigtable google-cloud-bigtable

asked Sep 18 '18 at 18:51

Roshan Fernando

votes

14 answers

SQL query : inner joins optimization between big tables

I have the 3 following tables in a MySQL 4.x DB : hosts: (300.000 records) id (UNSIGNED INT) PRIMARY KEY name (VARCHAR 100) paths: (6.000.000 records) id (UNSIGNED INT) PRIMARY KEY name (VARCHAR 100) urls: (7.000.000 records) host (UNSIGNED…

sql mysql optimization inner-join bigtable

asked Feb 04 '09 at 13:54

Nicolas

2,158
1
17
25

votes

1 answer

Maintain data in Google Bigtable for longer periods

We have use-cases where we would like to store a large volume of data in Google Bigtable for long periods: during product development for performance tuning for demos We need to store the data but we don't really need it to be "online" all the…

bigtable google-cloud-bigtable

asked May 28 '17 at 17:30

Sachin Hejip

votes

2 answers

How to connect to a running bigtable emulator from java

I am trying to use the bigtable emulator from gcloud beta emulators. I launch the emulator, grab the hostname (localhost) and port (in this instance 8885) gcloud beta emulators bigtable start Executing:…

java bigtable google-cloud-bigtable

asked Jul 25 '16 at 20:48

user1568967

1,816
2
16
18

votes

2 answers

What is the benefit of a Key-Value Store over Bigtable?

What is the point of using a dedicated Key-Value Store over Bigtable? My understanding of Bigtable is that it is implemented under the hood with SSTables which are key value based. Given that, then what technical implementation advantages does a…

amazon-dynamodb riak bigtable key-value-store

asked Dec 13 '12 at 10:20

user782220

10,677
21
72
135

votes

1 answer

Recursive Relationship with Google App Engine and BigTable

In a classic relational database, I have the following table: CREATE TABLE Person( Id int IDENTITY(1,1) NOT NULL PRIMARY KEY, MotherId int NOT NULL REFERENCES Person(Id), FatherId int NOT NULL REFERENCES Person(Id), FirstName…

python database-design google-app-engine bigtable

asked Jun 02 '09 at 05:38

Martin

39,309
62
192
278

votes

2 answers

Which database technology for big structured data?

Scenario: Think you have got 90TB of text in 200 tables. This is structured related data. compareable to dbpedia only more data. Any really relational and distributed and performant database would do the job. Don’t expect as many updates as a social…

mysql mongodb cloud cassandra bigtable

asked Apr 21 '11 at 06:39

Jonas

votes

2 answers

Explanation of performance considerations of read/write on Google Datastore (GAE)?

I'm having a very difficult time understanding the mechanics of the Google App Engine Datastore. I want to understand the mechanics so I can build my database in an optimal way for the database. Given my example below, can someone help me…

google-app-engine google-cloud-datastore bigtable

asked Feb 17 '11 at 19:38

Ryan

2,650
3
29
43

votes

4 answers

How to efficiently read rows from Google BigTable into a pandas DataFrame

Use case: I am using Google BigTable to store counts like this: | rowkey | columnfamily | | | col1 | col2 | col3 | |----------|------|------|------| | row1 | 1 | 2 | 3 | | row2 | 2 | 4 | 8 | | row3 | 3 …

python pandas bigtable pyarrow

asked Feb 16 '18 at 14:07

bartaelterman

Prev 1 2

…

35 36 Next