0

I am trying to filter rows from a HBase table (I am using HappyBase), concretely I am trying to get rows whose 'id' is less than 1000:

for key, data in graph_table.scan(filter="SingleColumnValueFilter('cf', 'id', <, 'binary:1000')"):
    print key, data

The results are the following ones:

<http://ieee.rkbexplorer.com/id/publication-d2a6837e67d808b41ffe6092db50f7cc> {'cf:type': 'v', 'cf:id': '100', 'cf:label': '<http://www.aktors.org/ontology/portal#Proceedings-Paper-Reference>'}
<http://www.aktors.org/ontology/date#1976> {'cf:type': 'v', 'cf:id': '1', 'cf:label': '<http://www.aktors.org/ontology/support#Calendar-Date>'}
<http://www.aktors.org/ontology/date#1985> {'cf:type': 'v', 'cf:id': '10', 'cf:label': '<http://www.aktors.org/ontology/support#Calendar-Date>'}

In the table there are rows with 'id' from 1 to 1000. If I code this in Java using HBase Java library it works fine, parsing integer value with Byte.toBytes() function.

Thank you.

  • Can you clarify: you only get results for id=1|10|100 using this search, while you have values from 1-1000 in the table? – Suman Apr 23 '14 at 16:23
  • Yes, I have values from 1 to more than 7000, in the previous post I wanted to remark that 996 results were missing. – Mikel Emaldi Manrique Apr 23 '14 at 16:40
  • SO is a Q&A site, therefore it would really help if you could bring both question and answer into the appropriate format. You can post an answer to your own question, that's perfectly ok. If that's not possible, consider removing the question entirely. – JensG Apr 23 '14 at 20:31
  • Yes, I was waiting for the 8 hours restriction to answer to my own question correctly. Thanks @JensG for your advice, I will take into account in future questions ;-) – Mikel Emaldi Manrique Apr 24 '14 at 07:41

1 Answers1

4

Well, the problem was that I was saving integers as strings, while the right way is to save them as bytes:

table.put(key, {'cf:id': struct.pack(">q", value)})

When querying to database, the values from the filter have to be packed too:

for key, data in graph_table.scan(filter="SingleColumnValueFilter('cf', 'id', <, 'binary:%s', true, false)" % struct.pack(">q", 1000)):
     print key, data

And finally, unpacking the result:

value = struct.unpack(">q", data['cf:id'])[0]

Thank you very much.