0

UPDATE: This only happens with Google Cloud Bigtable Emulator, not with actual development or production BigTable instances (Google Cloud SDK 149.0.0)

I'm trying to do row filtering by Key regex filter, everything is working like a charm (filter by prefix, filter by key start and stop range, by key, by keys) but I can't get it working passing in the RowKeyRegexFilteras filter, it just returns all the keys as an empty keys scan:

# all the boilerplate to create a happybase connection skipped 
t = connection.table("sometable")
t.put(
    b'row1',
    {
       b"family1:col2": b".1",
       b"family2:col2": b".12",
    }
)
t.put(
    b'row2',
    {
       b"family1:col2": b".2",
       b"family2:col2": b".22",
    }
)
t.put(
    b'row3',
    {
       b"family1:col2": b".3",
       b"family2:col2": b".32",
    }
)
rows = t.scan(
    filter=RowKeyRegexFilter(b'.+3')
)
print(len([i for i in rows])

That gives always 3, no matter if you put (nomatchforsure)+ as regex, I could not find any documentation with a working example, and the most amazing thing is that google.cloud.happybase.table.Table.rows performs a filter by row key always with RowKeyRegexFilter, but passing regex into rows method instead of real rows keys don't give regex filtering either, you can see it

here: https://github.com/GoogleCloudPlatform/google-cloud-python-happybase/blob/master/src/google/cloud/happybase/table.py#L197

and here: https://github.com/GoogleCloudPlatform/google-cloud-python-happybase/blob/master/src/google/cloud/happybase/table.py#L971

Any help on this would be very appreciated

danius
  • 2,664
  • 27
  • 33
  • I would suggest creating a github issue for this. – Solomon Duskis Mar 30 '17 at 02:55
  • I already added it to Google Groups but it's ok @SolomonDuskis I'm adding it to github as well, I'm feeling the only guy in the world trying to do this :( – danius Mar 30 '17 at 03:01
  • https://github.com/GoogleCloudPlatform/google-cloud-python-happybase/issues/21 let's see if someone sees this, looks like a bug but maybe I'm using it badly, the thing is there's no documentation nor example in the web after hours of googling – danius Mar 30 '17 at 03:07

1 Answers1

0

UPDATE: It's actually annotated in the docs as noticed by @gary-elliott: https://cloud.google.com/bigtable/docs/emulator#filters Regular expressions must contain only valid UTF-8 characters, unlike the actual Cloud Bigtable service which can process regular expressions as arbitrary bytes. Although something simple like (notmatchforsure)+is not working either although it seems containing valid UTF8 characters, on my testings I would say it is not limited, but generally speaking not working. Anyway is correctly warned in docs.

The actual problem is a bug on the emulator, I updated the answer to avoid misleading feedback, the solution was to create a development instance for testing the code, so for now if you want to do some development with Regex filters in BigTable you gotta create (and pay for...) an at least development instance ($0.65/hour, $0.17/GB at the moment of the response). Hope this helps as if someone is expecting to play with emulator he can get some hours stuck as I was.

danius
  • 2,664
  • 27
  • 33
  • This particular limitation is noted in the documentation for the emulator: https://cloud.google.com/bigtable/docs/emulator. It only effects binary regex filters. – Gary Elliott Mar 31 '17 at 01:11
  • You are right, my bad I understood it bad, in practice they are not working for the most cases, but it's really annotated. – danius Mar 31 '17 at 01:28