4

When I worked with HBase, I spent a lot time to convert the byte array into types like String or Long. Why does HBase store value as byte array instead of typed value?

2 Answers2

12

I don't think "Hbase stores everything as a byte[] because BigTable does" is actually a satisfying answer. My 2 cents :

It allows us to store any kind of data without much fuss. For example, imagine you have to store a product related data into your hbase table, say ID, make, country, price etc. To store each of these parameters you would have to take care of the individual datatypes of each of these parameters in advance which will definitely add some overhead. And unlike RDBMSs, hbase doesn't ask for all this at the time of table creation. So, even if datatypes of these parameters change tomorrow or you decide to add some parameters(with some new datatype), all you have to do is wrap the value in Bytes.ToBytes() and push it into your table. All this makes insertions faster.

Also, sometimes storing a value in a serialized byte[] form saves a few bytes as compared to storing the same values in their native format. And this minor saving becomes quite significant when you deal with BigData.

Long story short, Hbase does this to make things faster and to make storage more efficient, keeping the overhead of internal data structures to a minimum..

Tariq
  • 34,076
  • 8
  • 57
  • 79
  • Please can anyone tell me how to store data types in hbase and retrieve them? I'm really new to hbase and please help me out – Chamika Kasun Apr 22 '14 at 08:41
  • What do you mean by **how to store data types in hbase and retrieve them**?Also, it would be better if you ask it as a new question. This section is for comments. – Tariq Apr 22 '14 at 09:06
  • http://stackoverflow.com/questions/23215492/how-to-store-primitive-data-types-in-hbase-and-retrieve Here is the link, please help me out :) – Chamika Kasun Apr 22 '14 at 09:24
1

HBase is a Bigtable clone, and that's what Bigtable does. Bigtable typically does not store fine-grained data like a relational database, they store serialized objects, typically protocol buffers.

You can either try using the serialize object approach, or abstract the interface to the HBase library so that you only convert your types in a single place.

Jacob Groundwater
  • 6,581
  • 1
  • 28
  • 42