I am trying to pass in a list of regexes to the columns attribute in my happybase scan calls. This is because, my coloumn names are made by dynamically appending ids which i dont have acces to at scan time.
Is this possible?
I am trying to pass in a list of regexes to the columns attribute in my happybase scan calls. This is because, my coloumn names are made by dynamically appending ids which i dont have acces to at scan time.
Is this possible?
HappyBase author here.
According to the Thrift API you can pass regular expressions in the columns
argument for the ScannerOpen()
API family (see http://svn.apache.org/viewvc/hbase/trunk/hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift?view=markup#l717). However, the Thrift API used by HappyBase is ScannerOpenWithScan()
, which uses the TScan
struct (see http://svn.apache.org/viewvc/hbase/trunk/hbase-thrift/src/main/resources/org/apache/hadoop/hbase/thrift/Hbase.thrift?view=markup#l141), which does not contain any remark about regular expressions. Actually I don't know (without testing) whether this works.
A more flexible and powerful way is to specify a filter string using the filter
argument to happybase.Table.scan()
. See http://hbase.apache.org/book/thrift.html for the filter string syntax. In your case, something like "ColumnPrefixFilter('theprefix')"
should do the trick. See http://happybase.readthedocs.org/en/latest/api.html#happybase.Table.scan for the HappyBase API.
I am not familiar with HBase's syntax. Here is the happybase-python code I used, and it works for me. Thanks to Wouter Bolsterlee!! Not like the 'columns' statement, you don't have to put 'columnFamily' in 'ColumnPrefixFilter'.
import happybase
pool = happybase.ConnectionPool(size=3, host='172.xx.xx.xx')
with pool.connection() as conn1:
hbaseTable = conn1.table('HBase_table_name_here')
for rowKey, rowData in hbaseTable.scan(row_prefix= 'year-2015-', filter="ColumnPrefixFilter('month-06')", limit = 6):
print rowData