2

I'm new to Hbase. Need help, I have a table with some data in Hbase.

Id Name Address
1  john XX-XX
2  mike XXX-XX

and Id should auto increment. Now I have to insert data into the table like if we insert 10 records the Id should increment to 12 like

Id Name Address
1  john XX-XX
2  mike XXX-XX
3  foo   XXXX
...
...
12 booo  xxx

May be think of sequence generator in Hbase. Can someone help me with code.

zx485
  • 28,498
  • 28
  • 50
  • 59
  • are you working in the shell or in java code? – Martin Serrano Nov 20 '16 at 00:58
  • java code. Can we do with counters concept in hbase? If so, how can we do. –  Nov 20 '16 at 01:05
  • Read about rowkeys and their distribution across the region servers and how bad the sequential rowkeys would be considering hotspotting of regions etc. etc. you can create a sequential value for a column family using any programming language which has hbase drivers etc. Also read this answer here and the question can be a duplicate of this - http://stackoverflow.com/questions/26890944/hbase-auto-increment-any-column-row-key – Sergey Benner Nov 20 '16 at 01:06

1 Answers1

1

HBase does not have sequence generators. And as Sergey comments, using a sequence as a row key is not recommended. When faced with such a need it should be analyzed carefully. If you do end up needing such, a salted key approach is recommended.

HBase does support global counters (increment actions) which can be used to generate sequences. However, these cannot be used atomically to generate the key value for a row that is being added (you have to increment and get the value, then put the new row). Thus, you can get gaps if the subsequent put fails and you have 2 RPCs.

When we do use counters in this way, we use salted keys, accept that gaps can occur, and increment by blocks to avoid a RPC for every key needed.

Martin Serrano
  • 3,727
  • 1
  • 35
  • 48
  • What about counters: https://cloudfront.blogspot.fr/2012/06/hbase-counters-part-i.html ? – Adrien Brunelat Feb 23 '18 at 13:11
  • Yes one could use a counter, but this still results in bad key distribution and means you have a 2-pass update (one to get the sequence id, one to use it) which is a performance hit and leaves a potential gap in keys used (which may not matter) if the second update fails. – Martin Serrano Feb 23 '18 at 15:24
  • I agree, it's bad key distribution. still that means that HBase does have sequence generators, maybe that was added after your original answer? – Adrien Brunelat Feb 23 '18 at 15:37
  • Not IMO, but I see your point. I suppose it depends on your definition of sequence generator. Typically a database sequence generator will do the auto-increment for you within the insert transaction guaranteeing continuity of ids (no unused values), etc. – Martin Serrano Feb 23 '18 at 15:42
  • Oh ok, I get it. Maybe we could add details pointing towards counters to this post then. Even if it's questionable design in most cases, it's sometimes necessary. Personally, that would have saved me a lot of time. (Plus for people looking for counters, that could give this answer more upvotes!) – Adrien Brunelat Feb 23 '18 at 15:45