3

I want to scan rows in a HTable from the HBase shell using row matching some pattern.

For example, I have the following table data:

    row:r1_t1  column:cf:a, timestamp=1461911995948,value=v1
    row:r2_t2  column:cf:a, timestamp=1461911995949,value=v2
    row:s1_t1  column:cf:a, timestamp=1461911995950,value=q1
    row:s2_t2  column:cf:a, timestamp=1461911995951,value=q2

Based on the above data I want to find the rows that contain 't1' :

    row:r1_t1  column:cf:a, timestamp=1461911995948,value=v1
    row:s1_t1  column:cf:a, timestamp=1461911995950,value=q1

I know I can scan the table with PrefixFilter, but this method takes the rows that starts with the specified filter.

    scan 'test', {FILTER => "(PrefixFilter('s')"}

Is there a similar way of scanning the table based on filtering the rows with the pattern matching in the middle of the row name?

mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
Adrian Muntean
  • 322
  • 2
  • 15

1 Answers1

4
hbase(main):003:0> scan 'test', {ENDROW => 't1'}

In general, Using a PrefixFilter can be slow because it performs a table scan until it reaches the prefix.

Also can use RowFilter with SubstringComparator like below

Can use RowFilter with SubstringComparator like below

hbase(main):003:0> import org.apache.hadoop.hbase.filter.CompareFilter
hbase(main):005:0> import org.apache.hadoop.hbase.filter.SubstringComparator
hbase(main):006:0> scan 'test', {FILTER => org.apache.hadoop.hbase.filter.RowFilter.new(CompareFilter::CompareOp.valueOf('EQUAL'),SubstringComparator.new("searchkeyword"))}
user3190018
  • 890
  • 13
  • 26
Ram Ghadiyaram
  • 28,239
  • 13
  • 95
  • 121
  • When I am using this method on a large table I get `ERROR: Call id=14, waitTime=60001, operationTimeout=60000 expired.` Is there a way of increasing waitTime? – Adrian Muntean May 05 '16 at 09:47
  • In hbase-site.xml you can try increasing hbase.rpc.timeout to 1800000 – Ram Ghadiyaram May 05 '16 at 10:05
  • I would refer you to read this https://www.safaribooksonline.com/library/view/hbase-the-definitive/9781491905845/ch04.html . In general you have huge data in your table and you want search rows with some pattern, I would suggest FuzzyRowFilter using java api(see the example in the link) It has 2 parts fixed or known part of the rowkey and variable or unknown part of the key This will handle huge data efficiently. – Ram Ghadiyaram May 05 '16 at 10:20