2

I am looking for how to search for the latest rows in hbase table that is loaded by Nutch 2.3.

I use happybase and thrift, the only example I have found is in this link https://happybase.readthedocs.io/en/happybase-0.4/tutorial.html#using-table-namespaces

Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
Hakim
  • 21
  • 1
  • 4

1 Answers1

1

I dont know python so Im explaining this in hbase shell.. More or less you should be able to do this in python.

Python way : Convert timestamp since epoch to datetime.datetime

How to get latest timestamp to pass in the filter ?

LOG data to timestamp To convert the date '08/08/16 20:56:29' from an hbase log into a timestamp, do:

     hbase(main):021:0> import java.text.SimpleDateFormat
                   hbase(main):022:0> import java.text.ParsePosition
                    hbase(main):023:0> SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("08/08/16 20:56:29", ParsePosition.new(0)).getTime()
 => 1218920189000

after above you can try something like this:

scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}


hbase(main):001:0> scan

Here is some help for this command: Scan a table; pass table name and optionally a dictionary of scanner specifications. Scanner specifications may include one or more of: TIMERANGE, FILTER, LIMIT, STARTROW, STOPROW, TIMESTAMP, MAXLENGTH, or COLUMNS, CACHE

Some examples:

  hbase> scan '.META.'
  hbase> scan '.META.', {COLUMNS => 'info:regioninfo'}
  hbase> scan 't1', {COLUMNS => ['c1', 'c2'], LIMIT => 10, STARTROW => 'xyz'}
  hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}
  hbase> scan 't1', {FILTER => "(PrefixFilter ('row2') AND (QualifierFilter (>=, 'binary:xyz'))) AND (TimestampsFilter ( 123, 456))"}
  hbase> scan 't1', {FILTER => org.apache.hadoop.hbase.filter.ColumnPaginationFilter.new(1, 0)}
Community
  • 1
  • 1
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • but I can't have happybase use timerange because its not supported – Hakim Oct 29 '16 at 18:23
  • Hakim : there is something called timestamp filter in happybase? – Ram Ghadiyaram Oct 29 '16 at 18:27
  • Hakim I think in this syntax (`scan(row_start=None, row_stop=None, row_prefix=None, columns=None, filter=None, timestamp=None, include_timestamp=False, batch_size=1000, scan_batching=None, limit=None, sorted_columns=False`) you can pass filter which are related to happybase isnt it ? [pls check](http://happybase.readthedocs.io/en/latest/api.html) – Ram Ghadiyaram Oct 29 '16 at 20:54
  • My answer was useful? did you find any thing else apart from ? – Ram Ghadiyaram Nov 07 '16 at 12:00
  • Sorry for being late, The timestamp filter use a unique timestamp value, can't use timerange. – Hakim Nov 20 '16 at 07:24