16

i have the following scenario in my hbase instance

hbase(main):002:0> create 'test', 'cf'
0 row(s) in 1.4690 seconds

hbase(main):003:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.1480 seconds

hbase(main):004:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0070 seconds

hbase(main):005:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0120 seconds

hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value4'
0 row(s) in 0.0070 seconds

Now if you will see, the last two inserts are for the same column family, same column and same key. But if i understand hbase properly cf:c+row3 represent a cell which will have all timestamped versions of inserted value.

But a simple scan return only recent value

hbase(main):010:0> scan 'test'       
ROW                   COLUMN+CELL                                               
 row1                 column=cf:a, timestamp=1317945279379, value=value1        
 row2                 column=cf:b, timestamp=1317945285731, value=value2        
 row3                 column=cf:c, timestamp=1317945301466, value=value4        
3 row(s) in 0.0250 seconds

How do i get all timestamped values for a cell, or how to perform time range based query?

FUD
  • 5,114
  • 7
  • 39
  • 61

3 Answers3

30

In order to see versions of a column you need to give the version count.

scan 'test', {VERSIONS => 3}

will give you 2 versions of columns if they are available. you can use it in get aswell :

get 'test', 'row3', {COLUMN => 'cf:c', VERSIONS => 3}

for getting the value of a spesific time you can use TIMESTAMP aswell.

get 'test', 'row3', {COLUMN => 'cf:c', TIMESTAMP => 1317945301466}

if you need to get values "between" 2 timestamps you should use TimestampsFilter.

frail
  • 4,123
  • 2
  • 30
  • 38
  • 5
    Not that it's the case here, but you also should make sure the Table supports multiple versions, i.e. giving a CF the versions parameter. – Tony Oct 07 '11 at 17:06
  • 4
    indeed, you are right @Tony. creating table with : "create 'test', {NAME => 'cf', VERSIONS => N }" (default is 3) would be a good practice. And versions apply to column families not tables, you should give version to every single column family in table. – frail Oct 07 '11 at 17:16
  • I wonder if there's any way to tell 'scan' to retrieve all the existing versions, instead of setting a threshold – Diego Pino Oct 27 '12 at 10:12
  • 1
    @Tony, Frail, so versioning is not enabled by default ? One must specify it as part of column family structure. Am I correct ? – Kaushik Lele Sep 23 '15 at 09:00
  • 4
    @KaushikLele you can check http://hbase.apache.org/book.html#versions for more detail. As an answer to your question : "Prior to HBase 0.96, the default number of versions kept was 3, but in 0.96 and newer has been changed to 1." – frail Sep 23 '15 at 10:10
  • Yes frail.Thanks. I observed same. default version count is 1. I studied it little more and shared my observation here http://stackoverflow.com/a/32736636/1122841 – Kaushik Lele Sep 23 '15 at 10:24
  • Is it possible to use `get` to pull all the version for multiple columns for a specific row-key? Or may be, I have to use `scan` to the get the history of value changes for the entire columns. – notilas Feb 04 '19 at 19:15
2

To change the number of versions allowed in a column family use the following command:

 alter 'test', NAME=>'cf', VERSIONS=>2

then add another entry:

put 'test', 'row1', 'cf:a2', 'value1e'

then see the different versions:

get 'test', 'row1', {COLUMN => 'cf:a2', VERSIONS => 2}

would return something like:

COLUMN                        CELL                                                                                
 cf:a2                        timestamp=1457947804214, value=value1e                                              
 cf:a2                        timestamp=1457947217039, value=value1d                                              
2 row(s) in 0.0090 seconds

Here is a link for more details: https://learnhbase.wordpress.com/2013/03/02/hbase-shell-commands/.

slm
  • 15,396
  • 12
  • 109
  • 124
timmy_stapler
  • 547
  • 7
  • 15
1

The row key 'row3' of cf:c for value4 should be unique otherwise it gets overwritten:

hbase(main):052:0> scan 'mytable' , {COLUMN => 'cf1:1', VERSION => 3}
ROW                         COLUMN+CELL                                                                   
 1234                       column=cf1:1, timestamp=1405796300388, value=hello                            
1 row(s) in 0.0160 seconds

hbase(main):053:0> put 'mytable', 1234, 'cf1:1', 'wow!'
0 row(s) in 0.1020 seconds

Column 1 of cf1 having a value of 'hello' is overwritten by second put with same row key 1234 and a value of 'wow!'

hbase(main):054:0> scan 'mytable', {COLUMN => 'cf1:1', VERSION => 3}
ROW                   COLUMN+CELL                                               
 1234                 column=cf1:1, timestamp=1405831703617, value=wow!         
2 row(s) in 0.0310 seconds

Now the second insert contained a new value 'hey' for column 1 of cf1 and the scan query for last 3 versions now shows 'wow!' and 'hey', please not the versions are displayed on descending order.

hbase(main):055:0> put 'mytable', 123, 'cf1:1', 'hey'

hbase(main):004:0> scan 'mytable', {COLUMN => 'cf1:1', VERSION => 3}
ROW                   COLUMN+CELL                                               
 123                  column=cf1:1, timestamp=1405831295769, value=hey          
 1234                 column=cf1:1, timestamp=1405831703617, value=wow!         
Suresh Vadali
  • 139
  • 1
  • 3