I have a hierarchical row-key design, where each character is an ID of a field (we use 4 byte segments but I will stick to double digits for readability)
For example
00
0000 = child of 00
000000 = child of 0000
0001 = child of 00
000100 = child of 0001
I would like to make a hbase shell query to return the children of a node.
Right now I have the following
scan 'tableName', STARTROW=>'00',
FILTER=>"PrefixFilter('00') AND RowFilter(=,'regexstring:^00.{1}$')"
which gives the list of children of 00, namely 0000 0001
There are more than one question here:
1. If I remove the $ sign, the performance improves dramatically (from 2 seconds to 0.2 seconds on local VM) but I also get additional results (000000 and 000100, results I don't need). Is there a reason for this dramatic performance decrease ? (since it should be an additional filter on a narrowed down list)
2. Is there a way to filter by the length of the rowkey ? (then I can ditch regex and use only startrow/endrow) - this has to be done in hbase shell. For example FILTER=>"RowKeyLengthFilter(4)"
3. I cannot use word (\w) or digit (\d) in the regex string, is there a limitation of hbase shell ? Also tried with [[:alnum:]] and [[:digit:]] (thanks for Yunnosch for the suggestion)
version = 1.1.0.1, r4de7d45cb593f98ae5d020080cbc7116d3e9d9a0, Sun May 17 12:52:10 PDT 2015