4

I'm trying to write a component that fetches rows from HBase from the last 5 days (5 is arbitrary). The timestamp I want to use is the default timestamp HBase gives to rows (unless its problematic for some reason)

I know I can use scan and with timestamp range but I'm not quite sure how to get the current date in HBase (I'm currently testing it in the HBase shell but eventually I need to have a code doing that). I've tried something like this:

scan 'urls', {COLUMNS => 'urls', TIMERANGE => [SimpleDateFormat.new("yy/MM/dd HH:mm:ss").parse("2016/03/02 00:00:00", ParsePosition.new(0)).getTime(), new Date().getTime()]}

but the shell is saying I have a syntax error unexpected tCONSTANT. I did import both Date, SimpleDateFormat and ParsePosition successfully

I also looked at other examples but could not find exactly what I needed

I was also wondering if there is a more elegant way to accomplish this task?

Thanks in advance

Community
  • 1
  • 1
Gideon
  • 2,211
  • 5
  • 29
  • 47

1 Answers1

7

In HBase shell you can use TIMERANGE filter. Example from scan --help command:

hbase> scan 't1', {COLUMNS => 'c1', TIMERANGE => [1303668804, 1303668904]}

For java client, you can set timeRange on the scan object:

Scan s = new Scan();
s.setTimeRange(1303668804L, 1303668904L);
AdamSkywalker
  • 11,408
  • 3
  • 38
  • 76
  • I've seen this exact example in some of the links I posted, however, it's not what I need. I'm trying to fetch all the rows from the last N days and I'm not sure how to do this with HBase – Gideon Mar 08 '16 at 09:01
  • what's not working is I'm not sure how to get the data up to today. It seems like getting today's date is not possible just by doing new Date().getTime() – Gideon Mar 08 '16 at 10:06
  • @Gideon if there's problem with current date generation, why don't just try to use a bigger number, like a date for 2050 year, since current date is an upper bound? – AdamSkywalker Mar 08 '16 at 10:44
  • I don't mind doing that but I was just wondering if there's a more non-patchy way – Gideon Mar 08 '16 at 12:07
  • 1
    I forgot to accept this, but I ended up doing what you suggested. Thanks! – Gideon Dec 13 '16 at 13:11
  • Will this perform a full table scan, filtering rows as it goes? – Aviv Cohn Mar 10 '17 at 02:56
  • 1
    @AvivCohn yes, but there are some optimizations, like HFile that contains min and max timestamp of its cells, that can be skipped entirely if it does not much the query – AdamSkywalker Mar 10 '17 at 07:19