Questions tagged [bigtable]

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Bigtable

A Distributed Storage System for Structured Data

Bigtable is a distributed storage system (built by Google) for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers.

Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving).

Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products.

Some features

  • fast and extremely large-scale DBMS
  • a sparse, distributed multi-dimensional sorted map, sharing characteristics of both row-oriented and column-oriented databases.
  • designed to scale into the petabyte range
  • it works across hundreds or thousands of machines
  • it is easy to add more machines to the system and automatically start taking advantage of those resources without any reconfiguration
  • each table has multiple dimensions (one of which is a field for time, allowing versioning)
  • tables are optimized for GFS (Google File System) by being split into multiple tablets - segments of the table as split along a row chosen such that the tablet will be ~200 megabytes in size.

Architecture

BigTable is not a relational database. It does not support joins nor does it support rich SQL-like queries. Each table is a multidimensional sparse map. Tables consist of rows and columns, and each cell has a time stamp. There can be multiple versions of a cell with different time stamps. The time stamp allows for operations such as "select 'n' versions of this Web page" or "delete cells that are older than a specific date/time."

In order to manage the huge tables, Bigtable splits tables at row boundaries and saves them as tablets. A tablet is around 200 MB, and each machine saves about 100 tablets. This setup allows tablets from a single table to be spread among many servers. It also allows for fine-grained load balancing. If one table is receiving many queries, it can shed other tablets or move the busy table to another machine that is not so busy. Also, if a machine goes down, a tablet may be spread across many other servers so that the performance impact on any given machine is minimal.

Tables are stored as immutable SSTables and a tail of logs (one log per machine). When a machine runs out of system memory, it compresses some tablets using Google proprietary compression techniques (BMDiff and Zippy). Minor compactions involve only a few tablets, while major compactions involve the whole table system and recover hard-disk space.

The locations of Bigtable tablets are stored in cells. The lookup of any particular tablet is handled by a three-tiered system. The clients get a point to a META0 table, of which there is only one. The META0 table keeps track of many META1 tablets that contain the locations of the tablets being looked up. Both META0 and META1 make heavy use of pre-fetching and caching to minimize bottlenecks in the system.

Implementation

BigTable is built on Google File System (GFS), which is used as a backing store for log and data files. GFS provides reliable storage for SSTables, a Google-proprietary file format used to persist table data.

Another service that BigTable makes heavy use of is Chubby, a highly-available, reliable distributed lock service. Chubby allows clients to take a lock, possibly associating it with some metadata, which it can renew by sending keep alive messages back to Chubby. The locks are stored in a filesystem-like hierarchical naming structure.

There are three primary server types of interest in the Bigtable system:

  1. Master servers: assign tablets to tablet servers, keeps track of where tablets are located and redistributes tasks as needed.
  2. Tablet servers: handle read/write requests for tablets and split tablets when they exceed size limits (usually 100MB - 200MB). If a tablet server fails, then a 100 tablet servers each pickup 1 new tablet and the system recovers.
  3. Lock servers: instances of the Chubby distributed lock service. Lots of actions within BigTable require acquisition of locks including opening tablets for writing, ensuring that there is no more than one active Master at a time, and access control checking.

API

Typical operations to BigTable are creation and deletion of tables and column families, writing data and deleting columns from a row. BigTable provides this functions to application developers in an API. Transactions are supported at the row level, but not across several row keys.

References

Related Tags

528 questions
0
votes
1 answer

NoSql vs BigTable (comparison of client API's)

This might sound like a dumb question but I am recently learning about a Big Table. Would someone please tell me the advantage of using Big Table over NoSql databases. I eventually see both of them as semi-structured data storage. Some people…
Kunal Balani
  • 4,739
  • 4
  • 36
  • 73
0
votes
1 answer

hypertable core data structure

I'm looking for the implementation of the multi-dimensional map (or the LSM Tree), but I'm not able to find out which class correspond to the implementation of it, anyone knows that? Thanks!
realjin
  • 1,485
  • 1
  • 19
  • 38
0
votes
1 answer

App-engine query

I am new to app-engine Datastore and to NoSQL world in common. I am developing a simple application where a user can declare his/her expenses everyday. Every user(Account) has its own declared expenses. The dash board contains a simple GWT Cell Tree…
Adelin
  • 18,144
  • 26
  • 115
  • 175
0
votes
3 answers

BigTable with C# Library

Is there any sort of LinqToBigTable library out there or anything that makes it link up with C#? I am looking to integrate with App Engine BigTable.
naspinski
  • 34,020
  • 36
  • 111
  • 167
0
votes
1 answer

how to design a users table in hypertable

i want to design a users table with the following fields in hypertable database: rowkey : (unique Guid) username : (unique in the table) email : (unique in the table) passwordHash : (string field) passwordSalt : (string…
ygaradon
  • 2,198
  • 2
  • 21
  • 27
0
votes
1 answer

HBase schema row key design - increment counter?

I am struggling to find any documents about increment counter in HBase. Any one knows any? I am designing a Hbase table schema for my application. My row_key can't guarantee 100% uniqueness. So the question is that when my row_key starts having…
Shengjie
  • 12,336
  • 29
  • 98
  • 139
-1
votes
1 answer

How to setup staging, pre-prod for google dataflow jobs?

Say we have a dataflow job: Written in Apache Beam Java SDK which is a Gradle project. Uses pubsub stream as input, writes results to bigtable and writes logs to BigQuery. As with deploying a server, we can easily have a staging, pre-prod and prod…
-1
votes
7 answers

Design ideas for serving up high-frequency data

I want to build something to store and serve up time series data, which is coming in from a variety of sources at different time intervals. this includes both raw data and computed data. for example, let's say I want to log an every-30-seconds…
bobsmith
-1
votes
1 answer

Is BigTable appropriate for inserting single rows very frequently?

We have a streaming solution that takes messages from a pubsub topic and uses DataFlow to stream each message into a BigQuery table. This is a very appropriate use case for BigQuery. We would also like to take a subset of those messages and make…
jamiet
  • 10,501
  • 14
  • 80
  • 159
-1
votes
1 answer

Efficient way of deleting a empty row from google bigtable

we have set expiry for columns in bigtable. Over a period of time, the number of rows not holding any data(only keys) has been increased. I am looking for an efficient way to delete these empty rows from a table. For ex: key: key1 column1:…
deep
  • 31
  • 5
-1
votes
1 answer

is there a way in R such that the value of a column should be the one above if it meets a certain criteria in another column

I want the value in myrate column to be for the first value of myrate should be rupee (minus) amt for second row of myrate column should be the first value of myrate (value as generated in point 1) minus the second value of Rupee column if 'Name'…
-1
votes
1 answer

Reading BigTable and converting to Generic Records using GCP Cloud DataFlow

I am trying to convert BigTable table data to genric record using dataflow .After the conversion is done , i have to compare with another datasets in bucket . Below is my pseudo code , for pipeline i have used pipeline .apply("Read from…
-2
votes
1 answer

Bigtable data is removed automatically 30 minutes after insertion

I have a table in Bigtable named "orders" with one column family "order-family". It returns this configuration Column Family: order-family GC Rule: {"gcRule":{"maxAge":"86400s"}}. I can insert data into the "orders" table, but after 30 minutes the…
-2
votes
3 answers

How to download big table like MySQL into my pc?

How to download big table like MySQL into my pc?
zjm1126
  • 34,604
  • 53
  • 121
  • 166
-2
votes
2 answers

Which DBs support the Apache HBase 1.0 API?

I know both Zookeeper and Google Bigtable support the Apache HBase 1.0 API, are there more?
Bob van Luijt
  • 7,153
  • 12
  • 58
  • 101
1 2 3
35
36